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1.  INTRODUCTION 


The  7th  International  Conference  on  Field  and  Service  Robotics  (FSR  2009)  was  held  on 
the  MIT  Campus,  in  Cambridge,  Massachusetts,  USA  on  July  14-16  2009. 

The  goal  of  FSR  is  to  report  and  encourage  the  development  of  field  and  service  robotics. 
These  are  non-factory  robots,  typically  mobile,  that  must  operate  in  complex  and 
dynamic  environments.  Typical  field  robotics  applications  include  defense,  mining, 
agriculture,  building  and  construction,  forestry,  cargo  handling  and  so  on.  Field  robots 
may  operate  on  the  ground  (of  Earth  or  planets),  under  the  ground,  underwater,  in  the  air 
or  in  space.  Service  robots  are  those  that  work  closely  with  humans,  especially  the 
elderly  and  sick  to  help  improve  their  lives.  The  first  meeting  was  held  in  Canberra, 
Australia,  in  1997.  Since  then  the  meeting  has  been  held  every  2  years  in  the  pattern  of 
Asia,  America,  Europe. 


2.  REPORT  ON  CONFERENCE  OUTCOME 

Conference  attendance  exceeded  the  organizer’s  expectations,  with  101  attendees, 
including  41  students,  and  representatives  from  the  U.S.  Army  ERDC,  REDCOM, 
CASCOM,  DARPA,  Naval  Undersea  Warfare  Systems.  Attending  organizations 
including  the  German  Aerospace  Agency  (DLR)  and  the  NASA  Jet  Propulsion 
Laboratory,  and  corporations  included  iRobot,  Boston  Dynamics,  and  John  Deere. 
Conference  attendees  hailed  from  the  United  States,  Canada,  Europe,  Asia,  and  Australia. 

44  Technical  papers  were  presented  at  the  conference,  with  topics  including  robot  control, 
learning  for  mobile  systems,  sensor  data  processing,  mechanism  design,  issues  related  to 
autonomous  underwater  and  aerial  vehicles,  search  and  rescue  robots,  mining  robots,  and 
many  more.  These  papers  are  currently  being  collected  in  a  bound  hardcover  volume  that 
will  be  published  by  Springer  as  part  of  their  STAR  series.  A  subset  of  the  conference 
papers  will  be  collected  in  a  special  issue  of  the  International  Journal  of  Robotics 
Research,  to  be  published  in  late  2010,  and  edited  by  the  conference  co-organizers  (i.e. 
Karl  Iagnemma  of  MIT,  Alonzo  Kelly  of  CMU,  and  Andrew  Howard  of  JPL).  A 
complete  set  of  papers  from  the  conference  is  included  as  an  appendix  to  this  report,  as  is 
the  conference  program. 

Keynote  talks  were  presented  by  Marc  Raibert  of  Boston  Dynamics,  Peter  Wurman  of 
Kiva  Systems,  Christian  Laugier  of  INRIA,  and  Shigeo  Hirose  of  the  Tokyo  Institute  of 
Technology,  and  were  extremely  well  received. 

By  offsetting  various  costs  related  to  conference  organization,  support  from  the  Army 
allowed  us  to  offer  a  low  student  registration  price  of  $450.  It  also  helped  support  travel 
costs  for  the  invited  keynote  speakers.  Overall,  Army  support  was  of  tremendous 
importance  in  making  the  conference  a  success.  The  Army  was  acknowledged  for  their 
support  at  the  conference  opening,  in  the  printed  program,  and  on  large  display  boards 
outside  the  conference  meeting  room.  Other  financial  support  was  provided  by  the  iRobot 
corporation. 
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2. 1  POST-CONFERENCE  A CTIVITIES 

The  papers  presented  at  the  FSR  09  conference  are  currently  in  the  process  of  being 
published  in  a  hardcover  proceedings  volume  by  Springer,  with  an  estimated  publication 
date  of  Fall,  2010.  The  conference  organizers  have  worked  with  Springer  to  facilitate  the 
publication  process. 

In  addition,  a  selection  of  papers  (edited  by  the  conference  organizers)  have  been  invited 
for  submission  to  a  special  issue  of  the  International  Journal  of  Robotics  Research  that 
will  focus  solely  on  the  FSR  ’09  conference.  As  of  the  time  of  the  writing  of  this  report, 
nearly  all  of  the  selected  papers  have  successfully  gone  through  the  review  process  and 
are  being  prepared  for  press.  The  IJRR  special  issue  will  be  published  in  Fall  2010. 


3.  PICTURES  FROM  FSR  09  CONFERENCE 


Conference  attendees  at  MIT  Stata  center 
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Conference  attendees  at  registration  in  MIT  Stata  center 


Alonzo  Kelly  (co-organizer)  announcing  conference  opening 
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Conference  attendees  boarding  vessel  for  harbor  cruise  social  event 


Conference  attendees  on  harbor  cruise  social  event 
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Conference  attendees  at  dinner  at  the  Top  of  the  Hub  in  Boston 


Conference  attendees  at  dinner  at  the  Top  of  the  Hub  in  Boston 
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Conference  Information 

Plenary  talks  and  Sessions  will  take  place  in  Room  32-141  on  the  first  floor  of  the  MIT  Ray  and 
Maria  Stata  Center,  Building  32. 

Breakfast,  Lunch,  and  Breaks  take  place  in  Student  Street,  first  floor,  Stata  Center.  Seating  is 
available  throughout  the  first  floor  lobby. 

The  Conference  Desk  will  be  will  be  staffed  for  registration  and  information  services  on  the  first 
floor  lobby  of  the  Stata  Center  from  7:45am-4pm  Tuesday-Thursday 

Please  be  sure  to  wear  your  badge  to  all  conference  sessions  and  social  events. 

Internet: 

MIT  offers  short-term  network  service  to  campus  guests.  Your  computer  needs  to  be 
configured  for  DHCP  (obtaining  an  IP  address  automatically).  If  the  machine  is  running  firewall 
software,  it  will  need  to  be  disabled  until  the  registration  process  is  complete. 

Once  your  equipment  is  ready,  open  a  web  browser  and  point  it  to  any  web  page.  After 
selecting  Visitor  registration  (do  not  select  Conference),  the  returned  page  will  display  the 
MITnet  Rules  of  Use,  followed  by  a  registration  screen,  requesting  your  contact  information, 
number  of  days  of  connectivity,  and  the  event  for  which  you  are  on  campus.  Visitors  can 
register  for  one  to  five  (consecutive)  days  at  a  time,  up  to  fourteen  days  per  year.  The  network 
connection  takes  about  ten  minutes  to  activate  and  remains  active  for  the  number  of  days 
selected. 

For  more  information  on  connecting  to  the  network,  see 
http://web.mit.edu/is/topics/network/ 

Boston  Harbor  Cruise  -  Tuesday ;  July  14 

If  you  indicated  you  would  attend  the  cruise  and  dinner,  there  will  be  a  ticket  behind  your 
name  badge.  Please  wear  your  name  badge  and  bring  your  ticket  to  this  event.  All  beverages 
(soft  drinks  /beer/wine)  are  complimentary.  Transportation  to  the  cruise  will  be  provided; 
buses  will  depart  at  5:15PM. 

Banquet  -  Top  of  the  Hub  -  Wednesday ;  July  15 

If  you  indicated  you  would  attend  the  banquet,  there  will  be  a  ticket  behind  your  name  badge, 
along  with  your  meal  selection  and  two  drink  tickets  for  beer  and  wine.  A  cash  bar  will  be 
available  for  anyone  who  wishes  to  purchase  additional  beer  or  wine.  All  non-alcoholic 
beverages  are  complimentary.  Please  show  your  meal  choice  ticket  to  the  wait  staff  at  the 
dinner.  Transportation  to  the  banquet  will  be  provided;  buses  will  depart  at  5:30PM. 

Buses  for  both  the  cruise  and  banquet  depart  from  Stata  Center  entrance  closest  to  the  corner 
of  Vassar  and  Main  Streets.  At  the  end  of  the  evening  buses  will  depart  from  the  event  and 
drop  off  at  the  Stata  Center,  Le  Meridien  Hotel,  225  Albany  St  (Warehouse  Dormitory),  and 
Amherst  St  (for  those  staying  at  New  House  Dormitory). 

Please  note:  In  the  state  of  Massachusetts,  you  must  be  21  years  of  age  or  older  to  legally 
consume  alcohol,  and  proper  identification  is  required. 
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Parking 

There  is  no  conference  parking  available  on  the  MIT  campus.  Public  parking  is  available  in  the 
garage  next  to  the  Cambridge  Marriott  Hotel  at  4  Cambridge  Center.  There  is  also  a  small 
public  lot  on  the  corner  of  Vassar  Street  and  Massachusetts  Avenue.  Rates  are  approximately 
$24.00  per  day. 

Public  Transportation 

There  are  a  number  of  trains  and  buses  that  provide  public  transportation  to  the 
Cambridge/Boston  area.  The  No.  1  bus  stops  at  MIT's  main  entrance  (77  Massachusetts 
Avenue)  and  provides  service  to  Harvard  Square  and  downtown  Boston.  The  CT1  bus  departs 
from  Central  Square  and  also  stops  at  MIT's  main  entrance.  The  CT2  bus  departs  from  Kendall 
Square  and  stops  at  the  corner  of  Vassar  and  Massachusetts  Avenue.  One-way  bus  fare  is 
$1.50  CharlieTicket/cash-on-board;  $1.25/  pre-purchased  CharlieCard. 

The  MBTA  Red  Line  provides  train  service  to  the  MIT  area  via  stops  at  Central  Square  (on 
Massachusetts  Avenue)  and  Kendall/MIT  (on  Main  Street).  Both  stops  are  approximately  a  10- 
minute  walk  to  the  conference  site.  Subway  fare  is  $2.00  CharlieTicket/cash-on-board;  $1.70/ 
pre-purchased  CharlieCard.  Most  public  transportation  systems  run  between  the  hours  of  5:30 
AM  and  12:30  AM.  You  may  find  more  information,  including  information  on  purchasing  a 
CharlieCard,  at  the  MBTA  web  site:  http://www.mbta.com 

Taxis 

Taxis  are  available  during  the  day  at  77  Massachusetts  Avenue  (MIT’s  main  entrance)  or  at  the 
Broadway  entrance  of  the  Marriott  Hotel.  You  may  also  call  Ambassador  Cab,  617-492-1100  or 
Checker  Cab,  617-497-9000. 

Emergency  Services 

If  you  need  emergency  assistance  while  on  campus,  dial  100  from  any  campus  telephone  (off- 
campus/cell:  617-253-1212). 

Smoking  Policy 

Smoking  is  prohibited  in  all  spaces  of  all  MIT  buildings. 
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Tuesday  July  14th 

7:45  Breakfast  and  Registration 

8:30  Plenary  talk:  Mark  Raibert 

9:30  Session  "Mechanism  Design": 

•  Terrain  Modeling  and  Following  Using  a  Compliant  Manipulator  for 
Humanitarian  Demining  Applications 

Marc  Freese ,  Surya  P.  N.  Singh ,  William  Singhose ,  Edwardo  F.  Fukushima, 
Shigeo  Fiirose 

•  Towards  Autonomous  Wheelchair  Systems  in  Urban  Environments 
Chao  Gao ,  Michael  Sands ;  John  Spletzer 

•  Tethered  Detachable  Hook  for  the  Spiderman  Locomotion  (Design  of  the 
Hook  and  its  Launching  Winch) 

Nobukazu  Asano ,  Flideichi  Nakamoto ,  Tetsuo  Hagiwara ,  Shigeo  Hirose 
10:30  Break 

10:50  Session  "Perception  and  Control": 

•  Experimental  Study  of  an  Optimal-Control-Based  Framework  for  Trajectory 
Planning,  Threat  Assessment,  and  Semi-Autonomous  Control  of  Passenger 
Vehicles  in  Hazard  Avoidance  Scenarios 

Sterling  J  Anderson,  Steven  C.  Peters ,  Tom  E.  Pilutti,  and  Karl  lagnemma 

•  Receding  Horizon  Model-Predictive  Control  for  Mobile  Robot  Navigation  of 
Intricate  Paths 

Thomas  M.  Howard,  Colin  J.  Green,  and  Alonzo  Kelly 

•  Posterior  Probability  Estimation  Techniques  Embedded  in  a  Bayes  Filter  for 
Vibration-based  Terrain  Classification 

Philippe  Komma  and  Andreas  Zell 

•  Towards  Visual  Arctic  Terrain  Assessment 
Stephen  Williams  and  Ay  anna  M.  Howard 

12:10  Lunch 
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13:20  Plenary  talk:  Shigeo  Hirose 
14:20  Session  "Tracking  and  Servoing": 

•  Pedestrian  Detection  and  Tracking  Using  Three-Dimensional  LADAR  Data 
Luis  E.  Navarro-Serment ,  Christoph  Mertz,  and  Martial  Hebert 

•  Passive,  long-range  detection  of  Aircraft:  Towards  a  field  deployable  Sense 
and  Avoid  System 

Debadeepta  Dey,  Christopher  Geyer,  Sanjiv  Singh ,  Matt  Digioia 

•  Multiclass  Multimodal  Detection  and  Tracking  in  Urban  Environments 
Luciano  Spinello ,  Rudolph  Triebel,  and  Roland  Siegwart 

•  Vision-Based  Vehicle  Trajectory  Following  with  Constant  Time  Delay 
Hien  K.  Goi,  Timothy  D.  Barfoot,  Bruce  A.  Francis ;  and  Jared  L.  Giesbrecht 

15:40  Break 

16:00  Session  "Mining  Robotics": 

•  Swing  trajectory  control  for  large  excavators 

A.  W.  Denman ,  P.  R.  McAree,  M.  P.  Kearney ;  A.  W.  Reid,  and  K.  J.  Austin 

•  The  Development  of  a  Telerobotic  Rock  Breaker 

Elliot  Duff,  Con  Caris,  Adrian  Bonchis,  Ken  Taylor,  Chris  Gunn  and  Matt 
Adcock 

•  Camera  and  LIDAR  Fusion  for  Mapping  of  Actively  Illuminated  Subterranean 
Voids 

Uland  Wong,  Ben  Garney,  Warren  Whittaker  and  Red  Whittaker 
17:00  Break/travel 

18:00  Boston  Harbor  Cruise  and  Dinner  on-board  the  Lexington 
World  Trade  Center 

Transportation  is  provided  and  leaves  at  17:15 
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Wednesday  July  15th 

7:45  Breakfast  and  Registration 
8:30  Plenary  talk:  Christian  Laugier 

9:30  Session  "Human  Robot  Interaction": 

•  Bandit-Based  Online  Candidate  Selection  for  Adjustable  Autonomy 
Boris  Sofmon ,  J.  Andrew  Bagnell,  and  Anthony  Stentz 

•  Using  Virtual  Articulations  to  Operate  High-DoF  Inspection  and  Manipulation 
Motions 

Marsette  Vona,  David  Mitt  man,  Jeffrey  S.  Norris,  and  Daniela  Rus 

•  Field  Experiment  on  Multiple  Mobile  Robots  conducted  in  an  Underground 
Mall 

Tomoaki  Yoshida,  Keiji  Nagatani,  Eiji  Koyanagi,  Yasushi  Hada,  Kazunori  Ohno, 
Shoichi  Maeyama,  Hidehisa  Akiyama,  and  Satoshi  Tadokoro 

10:30  Break 

10:50  Session  "Marine  Robot  Design": 

•  A  Communication  Framework  for  Cost-effective  Operation  of  AUVs  in 
Coastal  Regions 

Arvind  Pereira,  Hordur  Heidarsson,  Carl  Oberg,  David  A.  Caron,  Burton  Jones 
and  GauravS.  Sukhatme 

•  Multi-Robot  Collaboration  with  Range-Limited  Communication:  Experiments 
with  Two  Underactuated  ASVs 

Filippo  Arrichiello,  Jnaneshwar  Das,  Hordur  Heidarsson,  Arvind  Pereira, 
Stefano  Chiaverini,  GauravS.  Sukhatme 

•  A  simple  reactive  obstacle  avoidance  algorithm  and  its  application  in 
Singapore  Harbour 

Tirthankar  Bandy opadhy ay,  Lynn  Sarcione,  Franz  Hover 
11:50  Lunch 
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13:00  Session  "Underwater  Localization  and  Mapping": 

•  Trajectory  Design  for  Autonomous  Underwater  Vehicles  based  on  Ocean 
Model  Predictions  for  Feature  Tracking 

Ryan  N.  Smith ,  Yi  Chao ,  Burton  H.  Jones ;  David  A.  Caron ,  Peggy  P.  Li  and 
Gaurav  S.  Sukhatme 

•  AUV  Benthic  Habitat  Mapping  in  South  Eastern  Tasmania 
Stefan  B.  Williams ,  Oscar  Pizarro,  Michael  Jakuba,  Neville  Barrett 

•  Sensor  Network  Based  AUV  Localisation 
David  Prasser  and  Matthew  Dunbabin 

•  Experiments  in  Visual  Localisation  around  Underwater  Structures 
Stephen  Nuske,  Jonathan  Roberts ,  David  Prasser  and  GordonWyeth 

14:20  Break 

14:40  Session  "Multi-Robot  Cooperation": 

•  Leap-Frog  Path  Design  for  Multi-Robot  Cooperative  Localization 
Stephen  Tally,  George  Kantor,  and  Howie  Choset 

•  A  Location-Based  Algorithm  for  Multi-hopping  State  Estimations  within  a 
Distributed  Robot  Team 

Brian  J.  Julian,  Mac  Schwager,  Michael  Angermann,  and  Daniela  Rus 

•  Cooperative  AUV  Navigation  using  a  Single  Surface  Craft 
Maurice  F.  Fallon,  Georg ios  Papadopoulos  and  John  J.  Leonard 

•  Multi-Robot  Fire  Searching  in  Unknown  Environment 

Ali  Marjovi,  Joao  Goncalo  Nunes,  Lino  Marques  and  Anba I  de  Almeida 

16:00  Break 

16:20  Session  "Service  Robotics": 

•  New  Measurement  Concept  for  Forest  Harvester  Head 
Mikko  Miettinen,  Jakke  Kulovesi,  Jouko  Kalmari  and  Arto  Visa  la 

•  Expliner:  Toward  a  Practical  Robot  for  Inspection  of  High-Voltage  Lines 
Paulo  Debenest,  Michele  Guarnieri,  Kenskue  Takita,  Edwardo  F.  Fukushima , 
Shigeo  Hirose,  Kubokawa,  Narumi  Iwama,  Fuminori  Shiga,  Kiyoshi  Tamura, 
Akihiro  Kimura 

17:00  Break/travel 
18:00  Banquet,  Top  of  the  Hub 
Prudential  Center 

Transportation  is  provided  and  leaves  at  17:30 

7 


Thursday  July  16th 

7:45  Breakfast  and  Registration 

8:30  Plenary  talk:  Peter  Wurman 

9:30  Session  "Planetary  Robotics": 

•  Model  Predictive  Control  for  Mobile  Robots  with  Actively  Reconfigurable 
Chassis 

P.  Michael  Furlong ,  Thomas  M.  Howard and  David  Wettergreen 

•  Turning  Efficiency  Prediction  for  Skid  Steer  Robots  Using  Single  Wheel 
Testing. 

Daniel  Flippo ,  Richard  Heller  and  David  P.  Miller 

•  Field  Experiments  in  Mobility  and  Navigation  with  a  Lunar  Rover  Prototype 
David  Wettergreen ,  Dominic  Jonak,  David  Kohanbash,  Scott  Moreland 
Spencer  Spiker,  and  James  Teza 

10:30  Break 

10:50  Session  "Localization": 

•  Radar  Scan  Matching  SLAM  using  the  Fourier-Mellin  Transform 

P.  Checchin ,  F.  Gerossier,  C.  Blanc ,  R.  Chapuis  and  L  Trassoudaine 

•  An  Automated  Asset  Locating  System  (AALS)  with  Applications  to  Inventory 
Management 

Thomas  H.  Miller ,  David  A.  Stolfo ,  and  John  R.  Spletzer 

•  Active  SLAM  and  Loop  Prediction  with  the  Segmented  Map  Using  Simplified 
Models 

Nathaniel  Fairfield  and  David  Wettergreen 

•  Outdoor  Downward-facing  Optical  Flow  Odometry  with  Commodity  Sensors 
Michael  Dille,  Ben  Grocholsky,  and  Sanjiv  Singh 

•  Place  Recognition  using  Regional  Point  Descriptors  for  3D  Mapping 
Michael  Bosse  and  Robert  Zlot 

12:30  Lunch 


8 


13:40  Session  "Mapping": 

•  Scan-point  Planning  and  3-D  Map  Building  for  a  3-D  Laser  Range  Scanner  in 
an  Outdoor  Environment 

Keiji  NAGATANI,  Takayuki  Matsuzawa,  and  Kazuya  Yoshida 

•  Image  and  Sparse  Laser  Fusion  for  Dense  Scene  Reconstruction 
Alastair  Harrison ,  Paul  Newman 

•  Relative  Motion  Threshold  for  Rejection  in  ICP  Registration 

Francois  Pomerleau ,  Francis  Colas ,  Francois  Ferland  and  Francois  Michaud 

•  Rover-Based  Surface  and  Subsurface  Modeling  for  Planetary  Exploration 
Paul  Furgale ,  Tim  Barfoot,  and  Nadeem  Ghafoor 

15:00  Break 

15:20  Session  "Learning  and  Cognition": 

•  Learning  to  identify  users  and  predict  their  destination  in  a  robotic  guidance 
application 

Xavier  Perrin ,  Francis  Colas ,  Cedric  Pradalier  and  Roland  Siegwart 

•  Applied  Imitation  Learning  for  Autonomous  Navigation  in  Complex  Natural 
Terrain 

David  Silver ,  J.  Andrew  Bagnell Anthony  Stentz 

•  Distributed  Sensing  and  Situation  Aware  Mobile  Robots 
Fulvio  Mastrogiovanni,  Antonio  Sgorbissa ,  and  Renato  Zaccaria 

•  Long  Term  Learning  and  Online  Robot  Behavior  Adaptation  for  Individuals 
with  Physical  and  Cognitive  Impairments 

Adriana  Tapus,  Cristian  Tapus  and  Maja  Mataric 

16:40  Break 

17:00  Closing  Reception,  Muddy  Charles  Pub 

MIT  Walker  Memorial  Building  (Building  50) 
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Plenary  Abstracts 


Tuesday  Morning  July  14th  -  Marc  Raibert 


Founder 

Boston  Dynamics 


What  is  next  for  BigDog ,  the  rough-terrain  robot? 

BigDog  has  hiked  in  rocky,  muddy  and  snowy  terrain, 
traveled  over  12  miles  without  stopping,  visually  followed  a 
human  leader,  and  done  limited  autonomous  cross  country 
travel.  So  what's  next  for  BigDog?  This  talk  will  give  a 
roadmap. 


j_h 

Tuesday  Afternoon  July  14  -  Shigeo  Hirose 

Professor 

Department  of  Mechanical  and  Aerospace  Engineering 
Tokyo  Institute  of  Technology 

Our  latest  topics  on  field  and  service  robotics 

I  will  introduce  some  of  our  latest  topics  related  to  the 
field  and  service  robotics,  e.g.  arm  mounted  buggy 
robot  "Gryphon  V"  for  humanitarian  demining, 
quadruped  walking  robot  "TITAN  XI"  for  steep  slope 
construction  tasks,  multi-wheeled  "Expliner"  for  the 
inspection  of  high-voltage  transmission  lines,  and 
pneumatic-driven  weight-balancing  arm  "Float-Arm"  for 
the  human  cooperative  automobile  assembly  task.  The  buggy  vehicle  of  the  "Gryphon 
V"  is  mounting  an  arm  with  wide  motion  range  and  other  devices  for  mine  detection. 
Through  several  experiments  conducted  on  mine  test  sites  in  Cambodia  and  Croatia, 
the  Gryphon  V  already  showed  high  robustness  and  better  performance  of  mine 
detection  than  that  of  human  deminers.  The  7  ton  world  largest  quadruped  walking 
robot  "TITAN-XI"  can  walk  around  on  the  surface  of  steep  slope  by  avoiding  the 
damage  of  the  lattice-like  ferroconcrete  frames  by  using  a  pair  of  suspension  wires, 
new  vision  system  and  intelligent  gait  control  system.  "Explainer"  is  now  developing 
to  make  the  detail  inspection  of  high-voltage  lines  to  prevent  blackout  during 
inspection  task.  It  already  succeeded  to  go  over  obstacles  by  changing  the  CG  of  its 
multi-joint  body  and  to  move  around  on  500K  voltage  lines.  From  the  introduction  of 
these  examples,  I  will  finally  try  to  extract  some  of  the  basic  design  principles  for  field 
and  service  robots. 
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J_h 

Wednesday  Morning  July  15  -  Christian  Laugier 
Research  Director 
INRIA  Rhone-Alpes 

Human ,  Dynamic  and  Open  Environments:  A  New  Challenge 
for  Robotics 

One  of  the  dreams  of  researchers  in  Robotics  is  to  develop 
robots  having  the  capability  to  "share"  the  human  living 
space.  Thanks  to  the  recent  technological  progress  in  sensor 
technologies,  robotics  technologies,  miniaturized 
mechatronic  systems,  and  embedded  computational 
power,  this  dream  is  gradually  becoming  a  reality.  In  the 
past  few  years,  some  autonomous  robots  have  already 
been  successfully  immersed  in  realistic  human 
environments  such  as  museums  (robots  for  guiding  visitors)  or  urban  road 
environments  populated  by  both  autonomous  vehicles  and  cars  driven  by  human 
beings  (the  2007  DARPA  Urban  challenge). 

However,  the  successful  deployment  of  autonomous  robots  among  human  beings  is 
still  a  real  challenge:  having  some  successful  large  experiments  in  realistic  human 
environments  is  clearly  a  necessary  but  insufficient  step;  indeed,  major  issues  such  as 
"Robustness  to  uncertainty"  and  "Safety"  have  to  be  more  deeply  addressed. 
Consequently,  previous  approaches  have  to  be  deeply  revisited  with  this  point  of 
view,  and  new  models  and  methods  have  to  be  designed  in  three  main 
complementary  research  directions:  (1)  Robust  perception  and  understanding  of  the 
surrounding  environment,  which  is  highly  dynamic,  open,  and  partially  known,  e.g. 
some  moving  obstacles  having  unknown  behaviours  might  suddenly  show  up;  (2) 
Autonomous  navigation  with  a  special  emphasis  on  the  Safety  issue,  i.e.  how  to 
guarantee  a  given  level  of  safety  when  making  goal-oriented  navigation  decisions  in 
the  presence  of  real  world  uncertainty;  and  (3)  Impact  of  the  human  factor  in  the 
involved  decision  processes. 

In  this  talk,  these  three  issues  will  be  addressed  and  the  most  promising  current 
developments  and  results  in  the  related  technical  areas  will  be  presented.  More 
precisely,  difficulties  and  approaches  to  solving  the  following  problems  will  be 
discussed:  Robust  detection  and  tracking  of  moving  entities  in  urban  road 
environments;  Prediction  of  the  future  behaviours  of  the  detected  moving  entities 
and  probabilistic  evaluation  of  the  related  risk  of  collision;  Safe  goal-oriented 
navigation  decisions  in  uncertain  environments;  Human-robot  interaction  for  the 
shared  control  of  an  autonomous  mobile  robot. 
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J_h 

Thursday  Morning  July  16  -  Peter  Wurman 

Engineering  Fellow,  Software  Architecture 

KIVA  Systems 

Associate  Professor 

Department  of  Computer  Science 

North  Carolina  State  University 

Moving  robotics  out  of  the  lab  and  into  commercial 
systems  presents  a  myriad  of  technical  and  business 
challenges.  Kiva  systems  has  developed  a  warehouse 
management  system  that  employs  hundreds  of 
mobile  robots  to  move  shelving  units  around 
distribution  centers  run  by  companies  like  Staples, 
Walgreens,  and  Zappos.  With  deployed  systems 
exceeding  500  robots  in  a  single  facility,  Kiva's 
warehouses  represent  the  largest  working  collections 
of  mobile  robotic  systems  in  the  world.  As  chief 
architect  and  cofounder  of  Kiva  Systems,  Pete  has 
worked  on  many  of  the  coordination  and  control 
problems  inherent  in  the  technology  and  its 
application  to  warehousing.  In  this  talk,  Pete  will 
discuss  some  of  the  challenges  the  team  has  faced  in 
bringing  the  Kiva  system  to  market. 
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Paper  Abstracts 


Tuesday  July  14th  9:30-10:30  -  Mechanism  Design 

o  Terrain  Modeling  and  Following  Using  a  Compliant  Manipulator  for  Humanitarian 
Demining  Applications 

Marc  Freese ,  Surya  P.  N.  Singh ,  William  Singhose,  Edwardo  F.  Fukushima,  Shigeo  Flirose 
Operations  with  flexible,  compliant  manipulators  over  large  workspaces  relative  to  the  manipulator  are 
complicated  by  sensor  noise,  vibration,  and  measurement  bias.  These  difficulties  are  compounded  by 
unstructured  environments,  such  as  those  encountered  in  humanitarian  demining.  By  taking  advantage 
of  the  static  structure  of  the  terrain  and  the  manipulator's  fundamental  mechanical  characteristics,  a 
series  of  adaptive  corrections  and  filters  refine  noisy  topographical  measurements.  These  filters  along 
with  a  shaped  actuation  scheme  can  generate  smooth  and  well-controlled  trajectories.  Experimental 
testing  was  performed  on  a  field  robot  with  a  compliant,  3  m  long-reach  hybrid  manipulator  and  a 
stereo  vision  system  for  terrain  sensing.  The  proposed  method  provides  a  vertical  tracking  precision  of 
±5  mm  on  a  variety  of  ground  clearings,  with  scanning  speeds  of  up  to  0.5  m/s.  As  such,  it  can  agilely 
move  the  attached  sensor(s)  through  precise  scanning  trajectories  that  are  very  close  to  the  ground. 
This  method  improves  overall  detection  and  generation  of  precise  maps  of  suspected  mine  locations. 

o  Towards  Autonomous  Wheelchair  Systems  in  Urban  Environments 
Chao  Gao >  Michael  Sands,  and  John  R.  Spletzer 

In  this  paper,  we  explore  the  use  of  synthesized  landmark  maps  for  absolute  localization  of  a  smart 
wheelchair  system  outdoors.  In  this  paradigm,  three-dimensional  map  data  are  acquired  by  an 
automobile  equipped  with  high  precision  inertial/GPS  systems,  in  conjunction  with  light  detection  and 
ranging  (LIDAR)  systems,  whose  range  measurements  are  subsequently  registered  to  a  global  coordinate 
frame.  The  resulting  map  data  are  then  synthesized  a  priori  to  identify  robust,  salient  features  for  use  as 
landmarks  in  localization.  By  leveraging  such  maps  with  landmark  meta-data,  robots  possessing  far 
lower  cost  sensor  suites  gain  many  of  the  benefits  obtained  from  the  higher  fidelity  sensors,  but  without 
the  cost.  We  show  that  by  using  such  a  map-based  localization  approach,  a  smart  wheelchair  system 
outfitted  only  with  a  2-D  LIDAR  and  encoders  was  able  to  maintain  accurate,  global  pose  estimates 
outdoors  over  almost  1  km  paths. 

o  Tethered  Detachable  Hook  for  the  Spiderman  Locomotion  (Design  of  the  Hook  and 
its  Launching  Winch) 

Nobukazu  ASANO,  Hideichi  NAKAMOTO,  Tetsuo  FI  AG  I  WAR  A,  Shigeo  HIROSE 
This  paper  introduces  a  new  concept  of  "tethered  detachable  hook  (TDH)"  and  its  launching  winch.  TDH 
system  is  the  device  which  will  be  mounted  on  a  mobile  robot  and  enhances  its  traversability  over 
extremely  hostile  terrain  by  launching  detachable  hook  to  nearby  objects,  producing  large  traction  force 
by  the  tether  and  detaching/recovering  the  hook  to  the  launcher  again.  In  this  paper  the  authors  first  of 
all  introduce  several  prototype  models  of  the  TDH.  We  then  discuss  the  design  of  latest  model  which 
features  pneumatic  detaching  mechanism,  the  pneumatic  launcher  and  the  reel  mechanism  having 
three  motion  states;  active  rotation,  free  rotation  and  braking.  Finally,  the  result  of  several  experiments 
of  constructed  TDH  model  will  be  explained. 
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Tuesday  July  14th  10:50-12:10  -  Perception  and  Control 
o  Experimental  Study  of  an  Optimal-Control-Based  Framework  for  Trajectory 
Planning,  Threat  Assessment,  and  Semi-Autonomous  Control  of  Passenger  Vehicles 
in  Hazard  Avoidance  Scenarios 

Sterling  J  Anderson,  Steven  C.  Peters ,  Tom  E.  Pilutti,  and  Karl  lagnemma 

This  paper  describes  the  design  of  an  optimal-control-based  active  safety  framework  that  performs 
trajectory  planning,  threat  assessment,  and  semi-autonomous  control  of  passenger  vehicles  in  hazard 
avoidance  scenarios.  The  vehicle  navigation  problems  if  formulated  as  a  constrained  optimal  control 
problem  with  constraints  bounding  a  navigable  region  of  the  road  surface.  A  model  predictive  controller 
iteratively  plans  an  optimal  vehicle  trajectory  through  the  constrained  corridor.  Metrics  from  this  "best- 
case"  scenario  establish  the  minimum  threat  posed  to  the  vehicle  given  its  current  state.  Based  on  this 
threat  assessment,  the  level  of  controller  intervention  required  to  prevent  departure  from  the  navigable 
corridor  is  calculated  and  driver/controller  inputs  are  scaled  accordingly.  This  approach  minimized 
controller  intervention  while  ensuring  that  the  vehicle  does  not  depart  from  a  navigable  corridor  of 
travel.  It  also  allows  for  multiple  actuation  modes,  diverse  trajectory-planning  objectives,  and  varying 
levels  of  autonomy.  Experimental  results  are  presented  here  to  demonstrate  the  framework's  semi- 
autonomous  performance  in  hazard  avoidance  scenarios. 

o  Receding  Horizon  Model-Predictive  Control  for  Mobile  Robot  Navigation  of 
Intricate  Paths 

Thomas  M.  Howard Colin  J.  Green ,  and  Alonzo  Kelly 

As  mobile  robots  venture  into  more  difficult  environments,  more  complex  state-space  paths  are 
required  to  move  safely  and  efficiently.  The  difference  between  mission  success  and  failure  can  be 
determined  by  a  mobile  robot's  capacity  to  effectively  navigate  such  paths  in  the  presence  of 
disturbances.  This  paper  describes  a  technique  for  mobile  robot  model  predictive  control  that  utilizes 
the  structure  of  a  regional  motion  plan  to  effectively  search  the  local  continuum  for  an  improved 
solution.  The  contribution,  the  receding  horizon  model-predictive  control  (RHMPC)  technique, 
specifically  addresses  the  problem  of  path  following  and  obstacle  avoidance  through  geometric 
singularities  and  discontinuities  such  as  cusps,  turn-in-place,  and  multi-point  turn  maneuvers  in 
environments  where  terrain  shape  and  vehicle  mobility  effects  are  non-negligible.  The  technique  is 
formulated  as  an  optimal  controller  that  utilizes  a  model-predictive  trajectory  generator  to  relax 
parameterized  control  inputs  initialized  from  a  regional  motion  planner  to  navigate  safely  through  the 
environment.  Experimental  results  are  presented  for  a  six-wheeled  skid-steered  field  robot  in  natural 
terrain. 

o  Posterior  Probability  Estimation  Techniques  Embedded  in  a  Bayes  Filter  for 
Vibration-based  Terrain  Classification 
Philippe  Komma  and  Andreas  Zell 

Vibration  signals  acquired  during  robot  traversal  provide  enough  information  to  yield  a  reliable 
prediction  of  the  current  terrain  type.  In  a  recent  approach,  we  combined  a  history  of  terrain  class 
estimates  into  a  final  prediction.  We  therefore  adopted  a  Bayes  filter  taking  the  posterior  probability  of 
each  prediction  into  account.  Posterior  probability  estimates,  however,  were  derived  from  support 
vector  machines  only,  disregarding  the  capability  of  other  classification  techniques  to  provide  these 
estimates.  This  paper  considers  other  classifiers  to  be  embedded  into  our  Bayes  filter  terrain  prediction 
scheme,  each  featuring  different  characteristics.  We  show  that  the  best  classification  results  are 
obtained  using  a  combined  k-nearest  neighbor  and  support  vector  machine  approach  which  has  not 
been  considered  for  terrain  classification  so  far.  Furthermore,  we  demonstrate  that  other  classification 
techniques  also  benefit  from  the  temporal  filtering  of  terrain  class  predictions. 


14 


o  Towards  Visual  Arctic  Terrain  Assessment 
Stephen  Williams  and  Ay  anna  M.  Howard 

Many  important  scientific  studies,  particularly  those  involving  climate  change,  require  weather 
measurements  from  the  ice  sheets  in  Greenland  and  Antarctica.  Due  to  the  harsh  and  dangerous 
conditions  of  such  environments,  it  would  be  advantageous  to  deploy  a  group  of  autonomous,  mobile 
weather  sensors,  rather  than  accepting  the  expense  and  risk  of  human  presence.  For  such  a  sensor 
network  to  be  viable,  a  method  of  navigating,  and  thus  a  method  of  terrain  assessment,  must  be 
developed  that  is  tailored  for  arctic  hazards.  An  extension  to  a  previous  arctic  terrain  assessment 
method  is  presented,  which  is  able  to  produce  dense  terrain  slope  estimates  from  a  single  camera.  To 
validate  this  methodology,  a  set  of  prototype  arctic  rovers  have  been  designed,  constructed,  and  fielded 
on  a  glacier  in  Alaska. 

Tuesday  July  14th  14:20-15:40 -Tracking  and  Servoing 
o  Pedestrian  Detection  and  Tracking  Using  Three-Dimensional  LADAR  Data 
Luis  E.  Navarro-Serment,  Christoph  Mertz,  and  Martial  Hebert 

The  approach  investigated  in  this  work  employs  three-dimensional  LADAR  measurements  to  detect  and 
track  pedestrians  over  time.  The  sensor  is  employed  on  a  moving  vehicle.  The  algorithm  quickly  detects 
the  objects  which  have  the  potential  of  being  humans  using  a  subset  of  these  points,  and  then  classifies 
each  object  using  statistical  pattern  recognition  techniques.  The  algorithm  uses  geometric  and  motion 
features  to  recognize  human  signatures.  The  perceptual  capabilities  described  form  the  basis  for  safe 
and  robust  navigation  in  autonomous  vehicles,  necessary  to  safeguard  pedestrians  operating  in  the 
vicinity  of  a  moving  robotic  vehicle. 

o  Passive,  long-range  detection  of  Aircraft:  Towards  a  field  deployable  Sense  and 
Avoid  System 

Debadeepta  Dey,  Christopher  Geyer,  Sanjiv  Singh ,  Matt  Digioia 

Unmanned  Aerial  Vehicles  (UAVs)  typically  fly  blind  with  operators  in  distant  locations.  Most  UAVs  are 
too  small  to  carry  a  traffic  collision  avoidance  system  (TCAS)  payload  or  transponder.  Collision  avoidance 
is  currently  done  by  flight  planning,  use  of  ground  or  air  based  human  observers  and  segregated  air 
spaces.  US  lawmakers  propose  commercial  unmanned  aerial  systems  access  to  national  airspace  (NAS) 
by  30th  September  2013.  UAVs  must  not  degrade  the  existing  safety  of  the  NAS,  but  the  metrics  that 
determine  this  have  to  be  fully  determined  yet.  It  is  still  possible  to  state  functional  requirements  and 
determine  some  performance  minimums.  For  both  manned  and  unmanned  aircraft  to  fly  safely  in  the 
same  airspace  UAVs  will  need  to  detect  other  aircraft  and  follow  the  same  rules  as  human  pilots. 

o  Multiclass  Multimodal  Detection  and  Tracking  in  Urban  Environments 
Luciano  Spinello ,  Rudolph  Triebel,  and  Roland  Siegwart 

This  paper  presents  a  novel  approach  to  detect  and  track  pedestrians  and  cars  based  on  the  combined 
information  retrieved  from  a  camera  and  a  laser  range  scanner.  Laser  data  points  are  classified  using 
boosted  Conditional  Random  Fields  (CRF),  while  the  image  based  detector  uses  an  extension  of  the 
Implicit  Shape  Model  (ISM),  which  learns  a  codebook  of  local  descriptors  from  a  set  of  hand-labeled 
images  and  uses  them  to  vote  for  centers  of  detected  objects.  Our  extensions  to  ISM  include  the 
learning  of  object  sub-parts  and  template  masks  to  obtain  more  distinctive  votes  for  the  particular 
object  classes.  The  detections  from  both  sensors  are  then  fused  and  the  objects  are  tracked  using  an 
Extended  Kalman  Filter  with  multiple  motion  models.  Experiments  conducted  in  real-world  urban 
scenarios  demonstrate  the  usefulness  of  our  approach. 
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o  Vision-Based  Vehicle  Trajectory  Following  with  Constant  Time  Delay 
Hien  K.  Goi,  Timothy  D.  Borfoot,  Bruce  A.  Francis ,  and  Jared  L.  Giesbrecht 
A  convoy  problem  is  formulated  and  solved  for  two  four-wheeled  vehicles.  The  task  is  for  the  second 
vehicle  to  follow  the  leader's  trajectory  with  a  constant  time  delay.  This  delayed  trajectory  can  be 
viewed  as  the  trajectory  of  a  delayed  leader.  This  novel  constant-time-delay  concept  allows  for  the 
estimation  of  the  delayed  leader's  speed  and  heading  using  the  vehicle  kinematics.  Decoupled 
longitudinal  and  lateral  controllers  are  developed  based  on  the  constant-time-delay  approach.  The 
lateral  controller  includes  a  look-ahead  feature  to  compensate  for  steering  delays.  Successful  field  trials 
were  conducted  with  full-sized  military  vehicles  on  a  1.3-kilometre  test  track.  The  tracking  errors  from 
differential  global  positioning  system  (DGPS)  ground  truth  covering  13  kilometres  are  presented. 

Tuesday  July  14th  16:00-17:00  -  Mining  Robotics 
o  Swing  trajectory  control  for  large  excavators 
A.  W.  Denman ,  P.  R.  McAree,  M.  P.  Kearney ,  A.  W.  Reid,  and  K.  J.  Austin 

There  is  a  strong  push  within  the  mining  sector  to  automate  equipment  such  as  large  excavators.  A 
challenging  problem  is  the  control  of  motion  on  high  inertia  degrees  of  freedom  where  the  actuators  are 
constrained  in  the  power  they  can  deliver  to  and  extract  from  the  system  and  the  machine's  underlying 
control  system  sits  between  the  automation  system  and  the  actuators.  The  swing  motion  of  an  electric 
mining  shovel  is  a  good  example.  This  paper  investigates  the  use  of  predictive  models  to  achieve 
minimum  time  swing  motions  in  order  to  address  the  question  what  level  of  performance  is  possible  in 
terms  of  realizing  minimum  time  motions  and  accurate  positional  control.  Experiments  are  described 
that  explore  these  questions.  The  work  described  is  associated  with  a  project  to  automate  an  electric 
mining  shovel  and  whilst  the  control  law  discussed  here  is  a  much  simplified  form  of  that  used  in  this 
work,  the  experimental  study  sheds  considerable  light  on  the  problem. 

o  The  Development  of  a  Telerobotic  Rock  Breaker 
Elliot  Duff,  Con  Caris,  Adrian  Bonchis,  Ken  Taylor,  Chris  Gunn  and  Matt  Adcock 
This  paper  describes  the  development  of  a  tele-robotic  rock  breaker  deployed  at  a  mine  over  lOOOkms 
from  the  remote  operations  centre.  This  distance  introduces  a  number  of  technical  and  cognitive 
challenges  to  the  design  of  the  system,  which  have  been  addressed  with  the  development  of  shared 
autonomy  in  the  control  system  and  a  mixed  reality  user  interface.  A  number  of  trials  were  conducted, 
culminating  in  a  production  field  trial,  which  demonstrated  that  the  system  is  safe,  productive 
(sometimes  faster)  and  integrates  seamlessly  with  mine  operations. 

o  Camera  and  LIDAR  Fusion  for  Mapping  of  Actively  Illuminated  Subterranean  Voids 
Uland  Wong,  Ben  Garney,  Warren  Whittaker  and  Red  Whittaker 

A  method  is  developed  that  improves  the  accuracy  of  super-resolution  range  maps  over  interpolation  by 
fusing  actively  illuminated  HDR  camera  imagery  with  LIDAR  data  in  dark  subterranean  environments. 
The  key  approach  is  shape  recovery  from  estimation  of  the  illumination  function  and  integration  in  a 
Markov  Random  Field  (MRF)  framework.  A  virtual  reconstruction  using  data  collected  from  the  Bruceton 
Research  Mine  is  presented. 

Wednesday  July  15th  9:30-10:30  -  Human  Robot  Interaction 
o  Bandit-Based  Online  Candidate  Selection  for  Adjustable  Autonomy 
Boris  Sofman,  J.  Andrew  Bagnell,  and  Anthony  Stentz 

In  many  robot  navigation  scenarios,  the  robot  is  able  to  choose  between  some  number  of  operating 
modes.  One  such  scenario  is  when  a  robot  must  decide  how  to  trade-off  online  between  autonomous 
and  human  tele-operation  control.  When  little  prior  knowledge  about  the  performance  of  each  operator 
is  known,  the  robot  must  learn  online  to  model  their  abilities  and  be  able  to  take  advantage  of  the 
strengths  of  each.  We  present  a  bandit-based  online  candidate  selection  algorithm  that  operates  in  this 
adjustable  autonomy  setting  and  makes  choices  to  optimize  overall  navigational  performance.  We 
justify  this  technique  through  such  a  scenario  on  logged  data  and  demonstrate  how  the  same  technique 
can  be  used  to  optimize  the  use  of  high-resolution  overhead  data  when  its  availability  is  limited. 
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o  Using  Virtual  Articulations  to  Operate  High-DoF  Inspection  and  Manipulation 
Motions 

Marsette  Vono,  David  Mittman,  Jeffrey  S.  Norris ,  and  Daniela  Rus 

We  have  developed  a  new  operator  interface  system  for  high-DoF  articulated  robots  based  on  the  idea 
of  allowing  the  operator  to  extend  the  robot's  actual  kinematics  with  virtual  articulations.  These  virtual 
links  and  joints  can  model  both  primary  task  DoF  and  constraints  on  whole-robot  coordinated  motion. 
Unlike  other  methods,  our  approach  can  be  applied  to  robots  and  tasks  of  arbitrary  kinematic  topology, 
and  allows  specifying  motion  with  a  scalable  level  of  detail.  We  present  hardware  results  where 
NASA/JPL's  All-Terrain  Hex-Legged  Extra-Terrestrial  Explorer  (ATHLETE)  executes  previously  challenging 
inspection  and  manipulation  motions  involving  coordinated  motion  of  all  36  of  the  robot's  joints. 

o  Field  Experiment  on  Multiple  Mobile  Robots  conducted  in  an  Underground  Mall 
Tomoaki  Yoshida,  Keiji  Nagatani,  Eiji  Koyanagi,  Yasushi  Hada,  Kazunori  Ohno ,  Shoichi 
Maeyama,  Hidehisa  Akiyama,  Kazuya  Yoshida ,  and  Satoshi  Tadokoro 

Rapid  information  gathering  during  the  initial  stage  of  investigation  is  an  important  process  in  case  of 
disasters.  However  this  task  could  be  very  risky,  or  even  impossible  for  human  rescue  crews,  when  the 
environment  has  contaminated  by  nuclear,  biological,  or  chemical  weapons.  We  developed  the 
information  gathering  system  using  multiple  mobile  robots  teleoperated  from  the  safe  place,  to  be 
deployed  in  such  situation.  In  this  paper,  we  described  functions  of  the  system  and  report  the  field 
experiment  conducted  in  a  real  underground  mall  to  validate  its  usability,  limitation,  and  requirements 
for  future  developments. 

Wednesday  July  15th  10:50-11:50  -  Marine  Robot  Design 
o  A  Communication  Framework  for  Cost-effective  Operation  of  AUVs  in  Coastal 
Regions 

Arvind  Pereira ,  Hordur  Heidarsson,  Carl  Oberg,  David  A.  Caron ,  Burton  Jones  and  Gaurav  S. 
Sukhatme 

Autonomous  Underwater  Vehicles  (AUVs)  are  revolutionizing  oceanography.  Most  high-endurance  and 
long-range  AUVs  rely  on  satellite  phones  as  their  primary  communications  interface  during  missions  for 
data/command  telemetry  due  to  its  global  coverage.  Satellite  phone  ( e.g .,  Iridium)  expenses  can  make 
up  a  significant  portion  of  an  AUV's  operating  budget  during  long  missions.  Slocum  gliders  are  a  type  of 
AUV  that  provide  unprecedented  longevity  in  scientific  missions  for  data  collection.  Here  we  describe  a 
minimally-intrusive  modification  to  the  existing  hardware  and  an  accompanying  software  system  that 
provides  an  alternative  robust  disruption-tolerant  communications  framework  enabling  cost-effective 
glider  operation  in  coastal  regions.  Our  framework  is  specifically  designed  to  address  multiple-  AUV 
operations  in  a  region  covered  by  multiple  networked  base-stations  equipped  with  radio  modems.  We 
provide  a  system  overview  and  preliminary  evaluation  results  from  three  field  deployments  using  a 
glider.  We  believe  that  this  framework  can  be  extended  to  reduce  operational  costs  for  other  AUVs 
during  coastal  operations. 

o  Multi-Robot  Collaboration  with  Range-Limited  Communication:  Experiments  with 
Two  Underactuated  ASVs 

Filippo  Arrichiello ,  Jnaneshwar  Das ,  Hordur  Heidarsson ,  Arvind  Pereira ,  Stefano  Chiaverini, 
Gaurav  S.  Sukhatme 

We  present  a  collaborative  team  of  two  under-actuated  autonomous  surface  vessels  (ASVs)  that 
performs  a  cooperative  navigation  task  while  satisfying  a  communication  constraint.  Our  approach  is 
based  on  the  use  of  a  hierarchical  control  structure  where  a  supervisory  module  commands  each  vessel 
to  perform  prioritized  elementary  tasks,  a  behavior-based  controller  generates  motion  directives  to 
achieve  the  assigned  tasks,  and  a  maneuvering  controller  generates  the  actuator  commands  to  follow 
the  motion  directives.  The  control  technique  has  been  tested  in  a  mission  where  a  set  of  target  locations 
spread  across  a  planar  environment  has  to  be  visited  once  by  either  of  the  two  ASVs  while  maintaining  a 
relative  separation  less  than  a  given  maximum  distance  (to  guarantee  inter-ASV  wireless 
communication).  Experiments  were  carried  out  in  the  field  with  a  team  of  two  ASVs  visiting  22  locations 
on  a  lake  surface  (approximately  30000 m2)  with  static  obstacles.  Results  show  a  30%  improvement  in 
mission  time  over  the  single-robot  case. 
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o  A  Simple  Reactive  Obstacle  Avoidance  Algorithm  and  Its  Application  in  Singapore 
Harbor 

Tirthankor  Bandyophadyay,  Lynn  Sarcione,  Franz  S.  Hover 

Autonomous  surface  craft  (ASC)  are  increasingly  attractive  as  a  means  for  performing  harbor  operations 
including  monitoring  and  inspection.  However,  due  to  the  presence  of  many  fixed  and  moving  structures 
such  as  pilings,  moorings,  and  vessels,  harbor  environments  are  extremely  dynamic  and  cluttered.  In 
order  to  move  autonomously  in  such  conditions  ASC's  must  be  capable  of  detecting  stationary  and 
moving  objects  and  plan  their  paths  accordingly.  We  propose  a  simple  and  scalable  online  navigation 
scheme,  wherein  the  relative  motion  of  surrounding  obstacles  is  estimated  by  the  ASC,  and  the  motion 
plan  is  modified  accordingly  at  each  time  step.  Since  the  approach  is  model-free  and  its  decisions  are 
made  at  a  high  frequency,  the  system  is  able  to  deal  with  highly  dynamic  scenarios.  We  deployed  ASC's 
in  the  Selat  Pauh  region  of  Singapore  Harbor  to  test  the  technique  using  a  short-range  2-D  laser  sensor; 
detection  in  the  rough  waters  we  encountered  was  quite  poor.  Nonetheless,  the  ASC's  were  able  to 
avoid  both  stationary  as  well  as  mobile  obstacles,  the  motions  of  which  were  unknown  a  priori.  The 
successful  demonstration  of  obstacle  avoidance  in  the  field  validates  our  fast  online  approach. 

Wednesday  July  15th  13:00-14:20  -  Underwater  Localization  and 
Mapping 

o  Trajectory  Design  for  Autonomous  Underwater  Vehicles  based  on  Ocean  Model 
Predictions  for  Feature  Tracking 

Ryan  N.  Smith ,  Yi  Chao ,  Burton  H.  Jones ,  David  A.  Caron ,  Peggy  P.  Li  and  Gaurav  S.  Sukhatme 
Trajectory  design  for  Autonomous  Underwater  Vehicles  (AUVs)  is  of  great  importance  to  the 
oceanographic  research  community.  Intelligent  planning  is  required  to  maneuver  a  vehicle  to  high¬ 
valued  locations  for  data  collection.  We  consider  the  use  of  ocean  model  predictions  to  determine  the 
locations  to  be  visited  by  an  AUV,  which  then  provides  near-real  time,  in  situ  measurements  back  to  the 
model  to  increase  the  skill  of  future  predictions.  The  motion  planning  problem  of  steering  the  vehicle 
between  the  computed  waypoints  is  not  considered  here.  Our  focus  is  on  the  algorithm  to  determine 
relevant  points  of  interest  for  a  chosen  oceanographic  feature.  This  represents  a  first  approach  to  an 
end  to  end  autonomous  prediction  and  tasking  system  for  aquatic,  mobile  sensor  networks.  We  design  a 
sampling  plan  and  present  experimental  results  with  AUV  retasking  in  the  Southern  California  Bight 
(SCB)  off  the  coast  of  Los  Angeles. 

o  AUV  Benthic  Habitat  Mapping  in  South  Eastern  Tasmania 

Stefan  B.  Williams ,  Oscar  Pizarro,  Michael  Jakuba,  Neville  Barrett 

This  paper  describes  a  two  week  deployment  of  the  Autonomous  Underwater  Vehicle  (AUV)  Sirius  on 
the  Tasman  Peninsula  in  SE  Tasmania  and  in  the  Huon  Marine  Protected  Area  (MPA)  to  the  South  West 
of  Hobart.  The  objective  of  the  deployments  described  in  this  work  were  to  document  biological 
assemblages  associated  with  rocky  reef  systems  in  shelf  waters  beyond  normal  diving  depths.  At  each 
location,  multiple  reefs  were  surveyed  at  a  range  of  depths  from  approximately  50  m  to  100  m  depth. 
We  illustrate  how  the  AUV  based  imaging  complements  benthic  habitat  assessments  to  be  made  based 
on  the  ship-borne  swath  bathymetry.  Over  the  course  of  the  10  days  of  operation,  19  dives  were 
undertaken  with  the  AUV  covering  in  excess  of  70  linear  kilometers  of  survey  and  returning  nearly 
160,000  geo-referenced  high  resolution  stereo  image  pairs.  These  are  now  being  analysed  to  describe 
the  distribution  of  benthic  habitats  in  more  detail. 

o  Sensor  Network  Based  AUV  Localisation 

David  Prasser  and  Matthew  Dunbabin 

The  operation  of  Autonomous  Underwater  Vehicles  (AUVs)  within  underwater  sensor  network  fields 
provides  an  opportunity  to  reuse  the  network  infrastructure  for  long  baseline  localisation  of  the  AUV. 
Computationally  efficient  localisation  can  be  accomplished  using  off-the-shelf  hardware  that  is 
comparatively  inexpensive  and  which  could  already  be  deployed  in  the  environment  for  monitoring 
purposes.  This  paper  describes  the  development  of  a  particle  filter  based  localisation  system  which  is 
implemented  onboard  an  AUV  in  real-time  using  ranging  information  obtained  from  an  ad-hoc 
underwater  sensor  network.  An  experimental  demonstration  of  this  approach  was  conducted  in  a  lake 
with  results  presented  illustrating  network  communication  and  localisation  performance. 
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o  Experiments  in  Visual  Localisation  around  Underwater  Structures 
Stephen  Nuske,  Jonathan  Roberts ,  David  Prasser  and  Gordon  Wyeth 

Localisation  of  an  AUV  is  challenging  and  a  range  of  inspection  applications  require  relatively  accurate 
positioning  information  with  respect  to  submerged  structures.  We  have  developed  a  vision  based 
localisation  method  that  uses  a  3D  model  of  the  structure  to  be  inspected.  The  system  comprises  a 
monocular  vision  system,  a  spotlight  and  a  low-cost  IMU.  Previous  methods  that  attempt  to  solve  the 
problem  in  a  similar  way  try  and  factor  out  the  effects  of  lighting.  Effects,  such  as  shading  on  curved 
surfaces  or  specular  reflections,  are  heavily  dependent  on  the  light  direction  and  are  difficult  to  deal 
with  when  using  existing  techniques.  The  novelty  of  our  method  is  that  we  explicitly  model  the  light 
source.  Results  are  shown  of  an  implementation  on  a  small  AUV  in  clear  water  at  night. 

Wednesday  July  15th  14:40-16:00-  Multi-Robot  Cooperation 
o  Leap-Frog  Path  Design  for  Multi-Robot  Cooperative  Localization 
Stephen  Fully,  George  Kantor,  and  Howie  Choset 

We  present  a  "leap-frog"  path  designed  for  a  team  of  three  robots  performing  cooperative  localization. 
Two  robots  act  as  stationary  measurement  beacons  while  the  third  moves  in  a  path  that  provides 
informative  measurements.  After  completing  the  move,  the  roles  of  each  robot  are  switched  and  the 
path  is  repeated.  We  demonstrate  accurate  localization  using  this  path  via  a  coverage  experiment  in 
which  three  robots  successfully  cover  a  20m  x  30m  area.  We  report  an  approximate  positional  drift  of 
1.1m  per  robot  over  a  travel  distance  of  140m.  To  our  knowledge,  this  is  one  of  the  largest  successful 
GPS-denied  coverage  experiments  to  date. 

o  A  Location-Based  Algorithm  for  Multi-hopping  State  Estimates  within  a  Distributed 
Robot  Team 

Brian  J.  Julian ,  Mac  Schwager,  Michael  Angermann,  and  Daniela  Rus 

Mutual  knowledge  of  state  information  among  robots  is  a  crucial  requirement  for  solving  distributed 
control  problems,  such  as  coverage  control  of  mobile  sensing  networks.  This  paper  presents  a  strategy 
for  exchanging  state  estimates  within  a  robot  team.  We  introduce  a  deterministic  algorithm  that 
broadcasts  estimates  of  nearby  robots  more  frequently  than  distant  ones.  We  argue  that  this  frequency 
should  be  exponentially  proportional  to  an  importance  function  that  monotonically  decreases  with 
distance  between  robots.  The  resulting  location-based  algorithm  increases  propagation  rates  of  state 
estimates  in  local  neighborhoods  when  compared  to  simple  flooding  schemes. 

o  Cooperative  AUV  Navigation  using  a  Single  Surface  Craft 
Maurice  F.  Fallon ,  Georgios  Papadopoulos  and  John  J.  Leonard 

Maintaining  accurate  localization  of  an  autonomous  underwater  vehicle  (AUV)  is  difficult  because 
electronic  signals  such  as  GPS  are  highly  attenuated  by  water  making  established  land-based  localization 
systems,  such  as  GPS,  useless  underwater.  Instead  we  propose  an  alternative  approach  which  integrates 
position  information  of  other  vehicles  to  reduce  the  error  and  uncertainty  of  the  on-board  position 
estimates  of  the  AUV.  This  approach  uses  the  WHOI  Acoustic  Modem  to  exchange  vehicle  localization 
estimates— albeit  at  low  transmission  rates— while  simultaneously  estimating  inter-vehicle  range.  The 
performance  capabilities  of  the  system  were  tested  using  Oceanserver's  Iver2  and  the  MIT  Scout  kayaks. 
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o  Multi-Robot  Fire  Searching  in  Unknown  Environment 
AH  Marjovi,  Joao  Gongalo  Nunes ,  Lino  Marques  and  Anibal  de  Almeida 

Exploration  of  an  unknown  environment  is  a  fundamental  concern  in  mobile  robotics.  This  paper 
presents  an  approach  for  cooperative  multi-robot  exploration,  fire  searching  and  mapping  in  an 
unknown  environment.  The  proposed  approach  aims  to  minimize  the  overall  exploration  time,  making  it 
possible  to  locate  fire  sources  in  an  efficient  way.  In  order  to  achieve  this  goal,  the  robots  should 
cooperate  in  an  effective  way,  so  they  can  individually  and  simultaneously  explore  different  areas  of  the 
environment  while  they  identify  fire  sources.  The  proposed  approach  employs  a  decentralized  frontier 
based  exploration  method  which  evaluates  the  cost-gain  ratio  to  navigate  to  target  way-points.  The 
target  way-points  are  obtained  by  an  A*  search  variant  algorithm.  The  potential  field  method  is  used  to 
control  the  robots'  motion  while  avoiding  obstacles.  When  a  robot  detects  a  fire,  it  estimates  the 
flame's  position  by  triangulation.  The  communication  between  the  robots  is  done  in  a  decentralized 
control  manner  where  they  share  the  necessary  data  to  generate  the  map  of  the  environment  and  to 
perform  cooperative  actions  in  a  behavioural  decision  making  way.  This  paper  presents  simulation  and 
experimental  results  of  the  proposed  exploration  and  fire  search  method  and  concludes  with  a 
discussion  of  the  obtained  results  and  future  improvements. 

Wednesday  July  15th  16:20-17:00  -  Service  Robotics 
o  New  Measurement  Concept  for  Forest  Harvester  Head 
Mikko  Miettinen,  Jakke  Kulovesi ,  Jouko  Kalmari  and  Arto  Visa  la 

A  new  measurement  concept  for  cut-to-length  forest  harvesters  is  presented  in  this  paper.  The  cut-to- 
length  method  means  that  the  trees  are  felled,  delimbed  and  cut-to-length  by  the  single-grip  harvester 
before  logs  are  transported  to  the  roadside.  The  concept  includes  measurements  done  to  standing  trees 
before  felling  to  calculate  optimal  length  of  logs.  The  modern  forest  harvesters  use  mechanical 
measurements  for  diameter  and  length.  In  this  paper,  we  will  discuss  different  non-contact  methods  of 
measuring  a  tree  stem  before  felling  and  during  the  cut-to-length  process.  Standing  tree  stems  are 
measured  with  a  3D  scanner  and  a  computer  vision  systems.  Trunk  processing  is  measured  with  a 
computer  vision  system.  Based  on  these  new  measurements,  tree  cutting  pattern  could  be  optimized 
and  harvester  automation  increased,  resulting  in  higher  resource  utilization. 

o  Expliner  -  Toward  a  Practical  Robot  for  Inspection  of  High-Voltage  Lines 
Paulo  Debenest ,  Michele  Guarnieri,  Kenskue  Takita,  Edwardo  F.  Fukushima,  Shigeo  Hirose, 
Kiyoshi  Tamura ,  Akihiro  Kimura ,  Hiroshi  Kubokawa,  Narumi  iwama,  Fuminori  Shiga,  Yukio 
Morimura  and  Youichi  Ichioka 

Preventive  maintenance  of  high-voltage  transmission  power  lines  is  a  dangerous  task,  but  the  obstacles 
mounted  on  the  lines  have  so  far  prevented  the  automation  of  this  task.  Expliner  aims  to  overcome  such 
obstacles  by  controlling  actively  the  position  of  its  center  of  mass,  thus  changing  its  configuration  as 
needed  when  moving  on  the  power  lines.  This  work  presents  the  design  of  Expliner  and  results  of  field 
experiments  performed  with  very  high  voltages  to  prove  the  effectiveness  of  the  proposed  concept. 

Thursday  July  16th  9:30-10:30  -  Planetary  Robotics 
o  Model  Predictive  Control  for  Mobile  Robots  with  Actively  Reconfigurable  Chassis 
P.  Michael  Furlong i,  Thomas  M.  Howard ,  and  David  Wettergreen 

Actively  reconfigurable  chassis  enable  planetary  mobile  robots  to  access  more  varieties  of  terrain.  While 
typical  approaches  for  exploiting  such  mechanisms  reply  on  feedback  control,  it  is  beneficial  to  consider 
actively  controlled  elements  at  planning  time  rather  than  during  motion  execution.  In  this  paper  we 
present  an  approach  for  extending  work  in  model-predictive  trajectory  generation  to  actively 
reconfigurable  chassis.  The  motion  planner  uses  a  kinematic  motion  model  and  a  terrain  shape  model  to 
determine  sequences  of  actions  that  minimize  a  cost  function  over  vehicle  attitude  by  modifying  the 
shape  of  the  velocity,  curvature,  and  chassis  configuration  profiles.  Simulation  and  field  results  are 
presented  demonstrating  the  benefits  of  this  technique  on  a  prototype  mobile  robot  for  lunar 
excavation. 
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o  Turning  Efficiency  Prediction  for  Skid  Steer  Robots  Using  Single  Wheel  Testing. 

Daniel  Flippo,  Richard  Heller  and  David  P.  Miller 

To  date,  most  field  robots  use  wheels  as  their  means  of  locomotion  (especially  true  of  planetary 
exploration  robots).  In  many  cases  these  robots  are  required  to  travel  significant  distances,  with  limited 
power,  and  over  rough  terrain.  All  of  which  make  wheels  a  major  component  contributing  to  their 
performance.  It  is  through  experimentation  and  iteration  that  effective  wheel  design,  for  a  given  rover 
in  a  given  mission,  can  be  achieved.  To  do  this,  the  SWEET  (Suspension  and  Wheel  Evaluation  and 
Experimentation  Testbed)  simulates  the  rover  environment  using  a  single  wheel  methodology.  The 
wheels  currently  being  tested  belong  to  the  SR2  skid  steer  Mars  rover  designed  and  built  at  the 
University  of  Oklahoma.  Simulating  a  skid  steer  turn  with  SWEET  is  achieved  by  varying  the  spinning  rate 
of  the  platform  under  the  wheel,  which  is  rotating  at  a  certain  rate,  and  recording  the  forces  incurred. 
These  forces  interact  in  such  a  way  that  the  relevant  mobility  properties  for  a  rover  can  be  predicted. 
This  experimentation  method  allows  for  cheap  and  timely  iterative  single  wheel  design. 

o  Field  Experiments  in  Mobility  and  Navigation  with  a  Lunar  Rover  Prototype 
David  Wettergreen ,  Dominic  Jonak,  David  Kohanbash,  Scott  Moreland ,  Spencer  Spiker ,  and 
James  Teza 

Scarab  is  a  prototype  rover  for  lunar  missions  to  survey  resources,  particularly  water  ice,  in  polar 
craters.  It  is  designed  as  a  prospector  that  would  use  a  deep  coring  drill  and  apply  soil  analysis 
instruments.  Its  chassis  can  transform  to  stabilize  its  drill  in  contact  with  the  ground  and  can  also  adjust 
posture  to  ascend  and  descent  steep  slopes.  Scarab  has  undergone  field  testing  at  lunar  analogue  sites 
in  Washington  and  Hawaii  in  an  effort  to  quantify  and  validate  its  mobility  and  navigation  capabilities. 
We  report  on  results  of  experiments  in  slope  ascent  and  descent  and  in  autonomous  kilometer-distance 
navigation  in  darkness. 

Thursday  July  16th  10:50-12:30  -  Localization 
o  Radar  Scan  Matching  SLAM  using  the  Fourier-Mellin  Transform 
Paul  Checchin,  Franck  Gerossier,  Christophe  Blanc ,  Roland  Chapuis  and  Laurent  Trassoudaine 
This  paper  is  concerned  with  the  Simultaneous  Localization  And  Mapping  (SLAM)  problem  using  data 
obtained  from  a  microwave  radar  sensor.  The  radar  scanner  is  based  on  Frequency  Modulated 
Continuous  Wave  (FMCW)  technology.  In  order  to  meet  the  needs  of  radar  image  analysis  complexity,  a 
trajectory-oriented  EKF-SLAM  technique  using  data  from  a  3605  field  of  view  radar  sensor  has  been 
developed.  This  process  makes  no  landmark  assumptions  and  avoids  the  data  association  problem.  The 
method  of  egomotion  estimation  makes  use  of  the  Fourier-Mellin  Transform  for  registering  radar 
images  in  a  sequence,  from  which  the  rotation  and  translation  of  the  sensor  motion  can  be  estimated.  In 
the  context  of  the  scan-matching  SLAM,  the  use  of  the  Fourier-Mellin  Transform  is  original  and  provides 
an  accurate  and  efficient  way  of  computing  the  rigid  transformation  between  consecutive  scans. 
Experimental  results  on  real-world  data  are  presented. 

o  An  Automated  Asset  Locating  System  (AALS)  with  Applications  to  Inventory 
Management 

Thomas  H.  Miller ,  David  A.  Stolfo,  and  John  R.  Spletzer 

In  this  work,  we  present  a  proof-of-concept  Automated  Asset  Locating  System  (AALS)  for  enhancing 
inventory  management.  AALS  integrates  LIDAR  and  RFID  sensor  measurements  into  a  Rao-Blackwellized 
particle  filter  for  simultaneously  localizing  its  pose  with  the  positions  of  assets  in  the  environment.  We 
present  significant  experimental  results  where  the  proof-of-concept  system  successfully  traveled  a  total 
distance  of  1.4  km  autonomously,  while  detecting  and  mapping  all  143  available  assets  in  real-time,  and 
with  a  mean  position  error  of  <  80  cm. 
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o  Active  SLAM  and  Loop  Prediction  with  the  Segmented  Map  Using  Simplified  Models 
Nathaniel  Fairfield  and  David  Wettergreen 

We  previously  introduced  the  SegSLAM  algorithm,  an  approach  to  the  simultaneous  localization  and 
mapping  (SLAM)  problem  that  divides  the  environment  up  into  segments,  or  submaps,  using  heuristic 
methods.  We  investigate  a  realtime  method  for  Active  SLAM  with  SegSLAM,  in  which  actions  are 
selected  in  order  to  reduce  uncertainty  in  both  the  local  metric  submap  and  the  global  topological  map. 
Recent  work  in  the  area  of  Active  SLAM  has  been  built  on  the  theoretical  basis  of  information  entropy. 
Due  to  the  complexity  of  the  SegSLAM  belief  state,  as  encoded  in  the  SegMap  representation,  it  is  not 
feasible  to  estimate  the  expected  entropy  of  the  full  belief  state.  Instead,  we  use  a  simplified  model  to 
heuristically  select  entropy-reducing  actions  without  explicitly  evaluating  the  full  belief  state.  We  discuss 
the  relation  of  this  heuristic  method  to  the  full  entropy  estimation  method,  and  present  results  from 
applying  our  planning  method  in  real-time  onboard  a  mobile  robot. 

o  Outdoor  Downward-facing  Optical  Flow  Odometry  with  Commodity  Sensors 
Michael  Dille,  Ben  Grochoisky,  and  Sanjiv  Singh 

Positioning  is  a  key  task  in  most  field  robotics  applications  but  can  be  very  challenging  in  GPS-denied  or 
high-slip  environments.  A  common  tactic  in  such  cases  is  to  position  visually,  and  we  present  a  visual 
odometry  implementation  with  the  unusual  reliance  on  optical  mouse  sensors  to  report  vehicle  velocity. 
Using  multiple  kilometers  of  data  from  a  lunar  rover  prototype,  we  demonstrate  that,  in  conjunction 
with  a  moderate-grade  inertial  measurement  unit,  such  a  sensor  can  provide  an  integrated  pose  stream 
that  is  at  times  more  accurate  than  that  achievable  by  wheel  odometry  and  visibly  more  desirable  for 
perception  purposes  than  that  provided  by  a  high-end  GPS-INS  system.  A  discussion  of  the  sensor's 
limitations  and  several  drift  mitigating  strategies  attempted  are  presented. 

o  Place  Recognition  using  Regional  Point  Descriptors  for  3D  Mapping 
Michael  Bosse  and  Robert  Ziot 

In  order  to  operate  in  unstructured  outdoor  environments,  globally  consistent  3D  maps  are  often 
required.  In  the  absence  of  a  absolute  position  sensor  such  as  GPS  or  modifications  to  the  environment, 
the  ability  to  recognize  previously  observed  locations  is  necessary  to  identify  loop  closures.  Regional 
point  or  keypoint  descriptors  are  a  way  to  encode  the  structure  within  a  small  local  region  as  a  fixed¬ 
sized  vector,  though  individually  do  not  include  enough  context  to  fully  identify  a  previously  seen  place. 
Multiple  queries  to  a  database  of  descriptor  vectors  can  quickly  identify  similar  features,  and  places  can 
be  recognized  from  a  consistent  set  of  descriptor  matches.  We  investigate  the  problem  of  designing 
informative  keypoint  descriptors  for  3D  laser  maps.  Several  models  are  considered  and  evaluated,  with 
a  particular  focus  on  the  optimal  descriptor  scale  and  keypoint  sampling  density.  The  approach  is 
evaluated  on  3D  laser  point  cloud  data  collected  from  a  vehicle  driving  in  unstructured  off-road 
environments.  Consistent  3D  maps  constructed  from  this  data  without  assistance  from  any  other  sensor 
(such  as  wheel  encoders,  GPS,  or  IMU)  demonstrate  the  effectiveness  of  our  approach. 

Thursday  July  16th  13:40-15:00  -  Mapping 

o  Scan-point  Planning  and  3-D  Map  Building  for  a  3-D  Laser  Range  Scanner  in  an 
Outdoor  Environment 

Keiji  NAGATANi,  Takayuki  Matsuzawa,  and  Kazuya  Yoshida 

During  search  missions  in  disaster  environments,  an  important  task  for  mobile  robots  is  map  building. 
An  advantage  of  three-dimensional  (3-D)  mapping  is  that  it  can  provide  depictions  of  disaster 
environments  that  will  support  robotic  teleoperations  used  in  locating  victims  and  aid  rescue  crews  in 
strategizing.  However,  the  3-D  scanning  of  an  environment  is  time-consuming  because  a  3-D  scanning 
procedure  itself  takes  a  time  and  scan  data  must  be  matched  at  several  locations.  Therefore,  in  this 
paper,  we  propose  a  scan-point  planning  algorithm  to  obtain  a  large  scale  3-D  map,  and  we  apply  a 
scan-matching  method  to  improve  the  accuracy  of  the  map.  We  discuss  the  use  of  scan-point  planning 
to  maintain  the  resolution  of  sensor  data  and  to  minimize  occlusion  areas.  The  scan-matching  method  is 
based  on  a  combination  of  the  Iterative  Closest  Point  (ICP)  algorithm  and  the  Normal  Distribution 
Transform  (NDT)  algorithm.  We  performed  several  experiments  to  verify  the  validity  of  our  approach. 
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o  Image  and  Sparse  Laser  Fusion  for  Dense  Scene  Reconstruction 
Alastoir  Harrison  and  Paul  Newman. 

This  paper  is  concerned  with  reconstructing  the  metric  geometry  of  a  scene  imaged  with  a  single  camera 
and  a  scanning  laser.  Our  aim  is  to  assign  each  image  pixel  with  a  range  value  using  both  image 
appearance  and  sparse  laser  data.  We  pose  the  problem  as  an  optimisation  of  a  cost  function 
encapsulating  a  spatially  varying  smoothness  cost  and  measurement  compatibility.  In  particular  we 
introduce  a  second  order  smoothness  term.  We  derive  cues  for  discontinuities  in  range  from  changes  in 
image  appearance  and  reflect  this  in  the  objective  function.  We  show  that  our  formulation  distills  down 
to  solving  a  large  linear  system  which  can  be  solved  swiftly  using  direct  methods.  Results  are  presented 
and  analysed  using  synthetic  cases  to  demonstrate  salient  behaviours  and  on  real  data  to  highlight  real- 
world  applicability. 

o  Relative  Motion  Threshold  for  Rejection  in  ICP  Registration 
Francois  Pomerleau,  Francis  Colas ,  Francois  Ferland  and  Francois  Michaud 
Simultaneous  Localization  and  Mapping  (SLAM)  iteratively  builds  a  map  of  the  environment  by  putting 
each  new  observation  in  relation  with  the  current  map.  This  relation  is  usually  done  by  scan  matching 
algorithms  such  as  Iterative  Closest  Point  (ICP)  where  two  sets  of  features  are  paired.  However  as  ICP  is 
sensitive  to  outliers,  methods  have  been  proposed  to  reject  them.  In  this  article,  we  present  a  new 
rejection  technique  called  Relative  Motion  Threshold  (RMT).  In  combination  with  multiple  pairing 
rejection,  RMT  identifies  outliers  based  on  error  produced  by  paired  points  instead  of  a  distance 
measurement,  which  makes  it  more  applicable  to  point-to-plane  error.  The  rejection  threshold  is 
calculated  with  a  simulated  annealing  ratio  which  follows  the  convergence  rate  of  the  algorithm. 
Experiments  demonstrate  that  RMT  performs  better  than  former  techniques  with  outliers  created  by 
dynamical  obstacles.  Those  results  were  achieved  without  reducing  convergence  speed  of  the  overall 
ICP  algorithm. 

o  Rover-Based  Surface  and  Subsurface  Modeling  for  Planetary  Exploration 
Paul  Furgale,  Tim  Barfoot,  and  Nadeem  Ghafoor 

We  develop  and  test  a  technique  for  the  creation  of  coupled  surface  and  subsurface  models.  Images 
from  a  stereo  camera  are  used  to  estimate  the  motion  of  a  rover  that  is  collecting  ground  penetrating 
radar  (GPR)  data.  The  motion  estimate  and  raw  sensor  data  are  used  to  build  two  novel  data  products: 
(1)  A  three-dimensional,  photorealistic  surface  model  coupled  with  a  ribbon  of  GPR  data,  and  (2)  a  two- 
dimensional,  topography-corrected  GPR  radargram  with  the  reconstructed  surface  topography  plotted 
above.  Each  result  is  derived  from  only  the  onboard  sensors  of  the  rover,  as  would  be  required  in  a 
planetary  exploration  setting.  These  techniques  were  tested  using  data  collected  in  a  Mars  analogue 
environment  on  Devon  Island  in  the  Canadian  High  Arctic.  GPR  transects  were  gathered  over  polygonal 
patterned  ground  similar  to  that  seen  by  the  Phoenix  lander  on  Mars.  Using  the  techniques  developed 
here,  scientists  may  remotely  explore  the  interaction  of  the  surface  topography  and  subsurface 
structure  as  if  they  were  on  site. 

Thursday  July  16th  15:20-16:40  -  Learning  and  Cognition 
o  Learning  to  identify  users  and  predict  their  destination  in  a  robotic  guidance 
application 

Xavier  Perrin ,  Francis  Colas ,  Cedric  Pradalier  and  Roland  Siegwart 

User  guidance  systems  are  relevant  to  various  applications  of  the  service  robotics  field,  among  which: 
smart  GPS  navigator,  robotic  guides  for  museum  or  shopping  malls  or  robotic  wheel  chairs  for  disabled 
persons.  Such  a  system  aims  at  helping  its  user  to  reach  its  destination  in  a  fairly  complex  environment. 
If  we  assume  the  system  is  used  in  a  fixed  environment  by  multiple  users  for  multiple  navigation  task 
over  the  course  of  days  or  weeks,  then  it  is  possible  to  take  advantage  of  the  user  routine:  from  the 
initial  navigational  choice,  users  can  be  identified  and  their  goal  can  be  predicted.  As  a  result  of  these 
prediction,  the  guidance  system  can  bring  its  user  to  its  destination  while  requiring  less  interaction.  This 
property  is  particularly  relevant  for  assisting  disabled  person  for  whom  interaction  is  a  long  and  complex 
task.  In  this  paper,  we  implement  a  user  guidance  system  using  a  dynamic  Bayesian  model  and  a 
topological  representation  of  the  environment.  This  model  is  evaluated  with  respect  to  the  quality  of  its 
action  prediction  in  a  scenario  involving  4  human  users,  and  it  is  shown  that  in  addition  to  the  user 
identity,  the  goals  and  actions  of  the  user  are  accurately  predicted. 
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o  Applied  Imitation  Learning  for  Autonomous  Navigation  in  Complex  Natural  Terrain 
David  Silver ,  J.  Andrew  Bagnell,  Anthony  Stentz 

Rough  terrain  autonomous  navigation  continues  to  pose  a  challenge  to  the  robotics  community.  Robust 
navigation  by  a  mobile  robot  depends  not  only  on  the  individual  performance  of  perception  and 
planning  systems,  but  on  how  well  these  systems  are  coupled.  When  traversing  rough  terrain,  this 
coupling  (in  the  form  of  a  cost  function)  has  a  large  impact  on  robot  performance,  necessitating  a  robust 
design.  This  paper  explores  the  application  of  Imitation  Learning  to  this  task  for  the  Crusher 
autonomous  navigation  platform.  Using  expert  examples  of  proper  navigation  behavior,  mappings  from 
both  online  and  offline  perceptual  data  to  planning  costs  are  learned.  Challenges  in  adapting  existing 
techniques  to  complex  online  planning  systems  are  addressed,  along  with  additional  practical 
considerations.  The  benefits  to  autonomous  performance  of  this  approach  are  examined,  as  well  as  the 
decrease  in  necessary  designer  interaction.  Experimental  results  are  presented  from  autonomous 
traverses  through  complex  natural  terrains. 

o  Distributed  Sensing  and  Situation  Aware  Mobile  Robots 
Fulvio  Mastrogiovanni,  Antonio  Sgorbissa  and  Renato  Zaccaria 

This  paper  introduces  a  cognitive  architecture  that  allows  service  robots  to  cooperate  with  smart 
environments  to  fulfill  user  assigned  global  tasks,  e.g.,  the  transport  of  various  kinds  of  material  in 
human  populated  environments,  such  as  hospitals.  Specifically,  knowledge  representation  issues  and 
context  assessment  strategies  are  introduced,  which  allows  event  monitoring  and  coordination  between 
service  robots  and  intelligent  applicances  to  occur.  Experimental  evaluation  is  thoroughly  described. 

o  Long  Term  Learning  and  Online  Robot  Behavior  Adaptation  for  Individuals  with 
Physical  and  Cognitive  Impairments 
Adriana  Tapus,  Cristian  Tapus  and  Maja  Mataric 

The  world's  population  is  growing  older,  and  therefore  a  wide  array  of  new  challenges  is  arising.  Most  of 
the  ageing  population  is  expected  to  need  physical  and  cognitive  assistance.  As  the  elderly  population 
continues  to  grow,  a  lot  of  research  started  to  be  dedicated  to  assistive  systems  aimed  at  promoting 
ageing-in-place,  facilitating  living  independently  in  one's  own  home  as  long  as  possible,  and  helping 
caregivers  and  doctors  to  provide  long-term  rehabilitation/therapy  protocols.  In  this  paper,  we  present 
a  new  long-term  learning  and  adaptive  socially  assistive  robotic  (SAR)  system  that  aims  to  provide  a 
customized  help  protocol  through  motivation,  encouragements,  and  companionship  to  users  suffering 
from  physical  and  cognitive  changes  related  to  aging  and/or  Alzheimer's  disease. 
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Terrain  Modeling  and  Following  Using  a 
Compliant  Manipulator  for  Humanitarian 
Demining  Applications 


Marc  Freese1,  Surya  P.  N.  Singh2,  William  Singhose3, 
Edwardo  F.  Fukushima1,  Shigeo  Hirose1 


Abstract  Operations  with  flexible,  compliant  manipulators  over  large  workspaces  relative 
to  the  manipulator  are  complicated  by  sensor  noise,  vibration,  and  measurement  bias. 
These  difficulties  are  compounded  by  unstructured  environments,  such  as  those  encoun¬ 
tered  in  humanitarian  demining.  By  taking  advantage  of  the  static  structure  of  the  terrain 
and  the  manipulator’s  fundamental  mechanical  characteristics,  a  series  of  adaptive  correc¬ 
tions  and  filters  refine  noisy  topographical  measurements.  These  filters  along  with  a  shaped 
actuation  scheme  can  generate  smooth  and  well-controlled  trajectories.  Experimental  test¬ 
ing  was  performed  on  a  field  robot  with  a  compliant,  3  m  long-reach  hybrid  manipulator 
and  a  stereo  vision  system  for  terrain  sensing.  The  proposed  method  provides  a  vertical 
tracking  precision  of  ±5  mm  on  a  variety  of  ground  clearings,  with  scanning  speeds  of  up  to 
0.5  m/s.  As  such,  it  can  agilely  move  the  attached  sensor(s)  through  precise  scanning  trajec¬ 
tories  that  are  very  close  to  the  ground.  This  method  improves  overall  detection  and  gen¬ 
eration  of  precise  maps  of  suspected  mine  locations. 


1.  Introduction 


While  robust  manipulation  in  difficult  field  conditions  is  still  in  its  infancy,  envi¬ 
ronmental  modeling  using  computer  vision  has  progressed  with  several  applica¬ 
tions,  including  autonomous  Martian  mapping  [10].  With  regards  to  manipulation, 
variation  and  noise  are  routinely  minimized  by  stiffening  the  structure  or  con¬ 
straining  the  operation  [1,  2].  In  humanitarian  demining,  the  increased  sensitivity 
needed  by  the  metal  detector  precludes  the  addition  of  proximal  metal  content  and 
requires  minimal  mean  and  variance  in  the  air  gap  to  the  ground.  In  demining,  a 
myriad  of  sensing  technologies  have  been  proposed  [7].  However,  relatively  little  at¬ 
tention  has  been  directed  towards  the  field  manipulation  requirements  for  automat¬ 
ing  the  dangerous  and  tedious  ground  scanning  task. 


1  Department  of  Mechanical  and  Aerospace  Eng.,  Tokyo  Institute  of  Technology, 
e-mail:  {marc@sms,  fukusima@mes,  hirose@mes}. titech.ac.jp 

2  Australian  Centre  for  Field  Robotics,  University  of  Sydney,  e-mail:  spns@usyd.edu.au 

3  School  of  Mechanical  Eng.,  Georgia  Institute  of  Technology,  e-mail:  singhose@gatech.edu 
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Fig.  1  Photograph  of  Gryphon  in  a  test  humanitarian  demining  field. 


An  autonomous  mobile  robot  named  Gryphon  [6]  has  been  developed  to  assist 
the  mine  detection  process.  As  shown  in  Figure  1,  the  robot  is  based  on  an  all  ter¬ 
rain  vehicle  (ATV)  [4]  to  which  a  custom  hybrid  robotic  manipulator  [5]  is  added. 
This  lightweight  and  counter-balanced  3 -DOF  arm  is  made  from  glass-fiber  rein¬ 
forced  plastic  (GFRP)  which  is  compatible  with  sensitive  metal  detection  (MD) 
operation  that  requires  minimal  metal  near  the  sensor  [3].  As  shown  in  Figure  2, 
this  design  has  a  tip  flexure  of  up  to  5  cm  (for  a  2  m  link)  due  to  gravitational 
forces  only.  While  the  structure  could  have  been  stiffened,  the  compliance  also 
provides  some  safety  in  the  event  of  a  collision.  Secondary  motion  from  the  ATV 
suspension  is  reduced,  but  not  eliminated,  through  counterbalancing  the  manipula¬ 
tor.  Hence,  there  is  uncertainty  in  sensor  location  with  respect  to  the  ground. 


Joint3 


Wrist  joints 
Sensor 


Manipulator 


Ground  frame  FG 


Joint  2 


Counter-weight 
Joint  1  (yaw) 


Fig.  2  Schematic  of  Gryphon  and  its  principal  frames  of  reference. 


As  illustrated  by  the  control  architecture  in  Figure  3,  Gryphon  operates  by  driv¬ 
ing  close  to  a  region  of  interest,  then  while  the  ATV  is  stationary,  generates  a  ste¬ 
reo  map.  As  detailed  in  Section  2,  these  are  iteratively  refined  to  construct  a  geo- 
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metric  terrain  model.  By  iteratively  operating  using  a  local  model,  absolute  recti¬ 
fication  is  not  required  because  later  processing  stages  account  for  aberrations 
through  command  shaping  [6,  11].  This  approach  adds  robustness  without  the 
need  to  identify  the  origin  of  imprecision.  However,  its  use  of  linear  models 
means  its  highest  accuracy  is  near  regions  used  to  perform  the  system-level  cali¬ 
bration  where  errors  are  small,  hence  defining  the  zone  of  effectiveness.  From  this 
terrain  model  a  desired  path  for  the  manipulator  endpoint  is  generated  and  cor¬ 
rected  for  the  height  errors  and  the  travel  of  a  detector  body  (as  opposed  to  end- 
effector  frame),  as  detailed  in  Section  3.  The  final  path  is  close  to  the  ground 
while  maintaining  the  best  possible  orientations  for  the  detector.  Experimental 
tests  of  the  manipulator  and  the  control  architecture  are  presented  in  Section  4. 


Fig.  3  Terrain  scanning  motion  generation  architecture.  (Darker  boxes  indicate  process  outputs). 


2.  Terrain  Modeling 

In  order  to  plan  operations  that  map  the  location  of  suspected  mines  found  by  the 
sensor,  it  is  necessary  to  form  a  geometric  model  of  the  terrain. 

The  terrain  modeling  begins  by  sensing  the  environment  using  stereo  vision.  A 
BumbleBee  (model  BB-HICOL-60,  Point  Grey  Research)  stereo  vision  camera 
(location  shown  in  Figure  2),  acquires  several  depth  maps  from  different  manipu¬ 
lator  configurations  to  cover  the  region  of  interest.  Compared  to  laser  mapping, 
this  generates  maps  quickly,  but  more  noise,  especially  in  regions  lacking  texture. 
Map  accuracy  depends  on  object  range  and  camera  aberrations.  Acquired  raw 
depth  maps  (or  disparity  maps)  are  checked  for  consistency  by  subdividing  them 
into  coherent  patches.  Patches  that  fail  to  comply  with  certain  criteria  (e.g.  do  not 
represent  possible  terrain  locations)  are  then  discarded. 
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2.1  Conditional  Planar  Filtering 

Due  to  limitations  on  camera  resolution  and  calibration,  a  raw  model  generated 
from  the  camera  data  is  significantly  degraded  by  noise.  Median  filtering  tech¬ 
niques  are  primarily  effective  against  the  shot  noise,  but  do  not  remove  high- 
frequency  components.  Simply  smoothing  (i.e.,  spatial  low-pass  filtering)  is  insuf¬ 
ficient  as  this  results  in  a  degradation  of  features,  especially  at  obstacles  bounda¬ 
ries,  which  could  lead  to  a  collision  between  the  end-effector  and  the  terrain. 

Thus,  an  adaptive  filter  based  on  the  average  region  planarity  is  used  to  adjust 
filter  kernel  sizes  for  both  Gaussian  smoothing  and  median  filters  (i.e.,  the  condi¬ 
tional  planar  filter  [CPF]).  The  planarity  of  the  selected  region  is  determined  by 
calculating  the  mean  deviation  between  each  point  in  the  region  and  the  region’s 
corresponding  (least-squares)  best-fitting  plane.  If  the  deviation  is  small,  the  re¬ 
gion  is  assumed  to  be  planar,  hence  giving  a  conservative,  but  rapidly  computed 
terrain  classification.  Based  on  this,  the  strength  of  the  Gaussian  and  median  ker¬ 
nels  are  varied  depending  on  the  deviation  magnitude.  Applying  this  to  the  per¬ 
ceived  data  yields  a  less  noisy,  but  still  potentially  offset,  or  biased,  map. 


2.2  Height  Map  Generation 


To  simplify  and  speed-up  terrain  data  processing,  the  depth  maps  are  transformed 
to  a  common  height  map  function.  This  map  represents  the  terrain  as  a  series  of 
height  (or  z-coordinate)  values  at  point  locations  specified  by  a  uniform  mesh  in 
the  ground  plane  of  the  ground  frame.  This  is  done,  for  instance,  via  Delauney  tri¬ 
angulation  methods  with  increased  precision  obtained  through  spatial  weighted 
averages  of  the  sensed  data.  The  obtained  height  map  offers  the  advantage  of  two 
dimensionally  indexed  queries  on  the  terrain  model  and  facilitates,  by  linear  inter¬ 
polation,  a  mechanism  to  fill  holes  and  patches  that  may  have  arisen  due  to  occlu¬ 
sions  or  lack  of  texture. 


2.3  Height  Map  Expansion 

At  this  stage  of  the  process,  the  height  map  is  a  good  approximation  of  the  under¬ 
lying  topographical  information;  however,  it  is  often  desired  to  perform  the  scan¬ 
ning  at  a  constant  distance  from  the  ground  (i.e.,  a  scanning  gap).  For  that  purpose 
we  expand  the  height  map  via  an  envelope  expansion.  That  is,  we  solve  for  a  new 
surface,  fenv,  whose  distance  from  fterr  is  given  by  the  scanning  gap.  This  is  illus¬ 
trated  in  Figure  4.  Note  that  the  simple  approach  of  shifting  the  height  map  verti¬ 
cally  would  be  a  bad  approximation  for  highly  curved  or  inclined  terrain. 
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Fig.  4  Terrain  surface  expansion. 


Considering  the  continuous  case,  the  desired  surface  is  easily  obtained  with  the 
following  parametric  equations: 

Xenv  =  Xterr  ~  *  '  Qfterr  (X,err  >  r)  '  9x  0) 

yenv  =  y terr  ~  *  '  Qfterr  (• Xterr ,  7 ten  )  1  fy  (2) 

~  env  —  fterr  ( Xterr  5  J’/trr  )  +  ^  _  i.Xenv  >  y  env  )  (2) 


(4) 


3.  Path  Generation 


The  path  generation  for  the  manipulator  takes  as  inputs  the  expanded  terrain 
model  from  the  previous  section  and  the  manipulator  configuration,  taking  into 
account  joint  limitations  of  the  wrist  mechanism. 


3.1  Scanning  Scheme 

Two  different  linear  incrementing  scanning  schemes  are  available:  in  joint  space 
(giving  a  circular  profile  since  the  base  joint  is  revolute)  or  in  workspace  (giving  a 
rectangular  profile).  The  trajectories  are  smoothly  combined  at  their  extremities  to 
reduce  unnecessary  slowdowns  during  direction  changes.  Performing  this  in  the 
joint-space  of  the  robot  simplifies  joint  coordination  by  reducing  simultaneous 
motions  and  velocity  variation,  and  thus  is  dynamically  more  stable.  This  reduces 
power  consumption  and  arm  vibration  and  improves  detection  performance  by  re¬ 
ducing  sensor  location  uncertainty. 


3.2  Terrain  Sampling 

Either  of  the  above  described  scanning  schemes  produce  a  planar  path,  where  only 
the  x-y  coordinate  components  are  defined.  The  z  coordinate  is  obtained  by  sam¬ 
pling  the  corresponding  position  on  the  expanded  terrain.  Orientation  of  the  detec¬ 
tor  at  this  position  is  calculated  using  the  normal  vector  to  the  expanded  terrain  us¬ 
ing  the  following  relationships: 
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’‘path 


"  f  env  path  ’  T path  ) 


nPath  =  [-y-dfem(xpath,ypath)ldx  -y-dfenXxpath,ypath)ISy  y] 

y  =  (dfenV(Xpa,k’ypaJ/dxf  +  {dfe„v(X path’  J path)' 1  fyf  +  ^ 


(5) 

(6) 
(7) 


3.3  Advanced  Terrain  Following  (ATF) 

The  planned  path  has  focused  on  guarantying  that  the  center  of  the  sensor  follows 
the  expanded  terrain.  With  the  consequence  that  the  rest  of  the  sensor  body  is  still 
free  to  enter  the  scanning  gap  or  even  to  collide  with  the  terrain  at  positions  where 
the  curvature  is  concave,  thus,  it  is  important  to  consider  the  sensor  as  an  extended 
body  attached  to  the  manipulator  end  point.  This  is  performed  by  modeling  the  sen¬ 
sor  via  a  series  of  control  points  along  its  lower  surface;  where,  the  number  and  po¬ 
sition  of  points  has  been  chosen  to  best  approximate  the  contour  and  surface. 

For  each  sensor  position  on  the  trajectory,  all  control  points  are  tested  for  pos¬ 
sible  collisions  with  the  expanded  terrain.  For  each  point  under  the  grid,  a  small 
correction  rotation  (e.g.,  V20)  of  the  sensor  is  performed  in  the  order  of  vertical  de¬ 
viation  as  illustrated  in  Figure  5a.  This  is  iteratively  repeated  until  an  equilibrium 
orientation  is  reached  (cf.  Figure  5d).  The  sensor  orientation  is  then  corrected  to 
respect  configuration-space  constraints  by  limiting  wrist  joint  values  to  within  al¬ 
lowable  ranges  and  the  sensor’s  z-coordinate  is  modified  so  as  to  have  no  control 
point  in  the  scanning  gap  (cf.  Figure  5e).  Such  an  approach  can  lead  to  large  gaps 
between  the  detector  center  and  the  terrain  at  concave  positions,  but  it  guarantees  a 
constant  minimum  distance  between  the  sensor  as  a  whole  and  the  terrain. 


Fig.  5.  Path  correction  scheme  with  the  dots  indicating  control  points. 

A  reactionary  approach,  by  comparison,  would  simply  lift  the  sensor  without 
changing  its  orientation  and  thus  is  not  sufficient  as  it  would  lead  to  non-ideal  con¬ 
figurations  with  large  and  changing  air  gaps  and  potentially  excessive  motion. 


3.4  Partial  Path  Correction 

While  perception  inaccuracies  were  corrected  for  the  horizontal  plane  in 
the  Terrain  Modeling  Section,  the  model  only  partially  accounted  for  mechanical 
inaccuracies.  The  type  of  mechanical  inaccuracies  corrected  for  are  typically  those 
arising  from  mechanical  compliances  (in  the  links  and  base)  and  calibration  errors. 
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Within  the  limitations  of  the  zone  of  effectiveness ,  the  path’s  x  and  y  components 
are  modified  appropriately.  This  second  part  of  the  system-level  calibration  con¬ 
sists  of  applying  a  radial  offset  and  scaling: 

P'  =  P-  scaling 0  +  offset  p  (8) 

and  an  angular  offset: 

<p' =  <p  + offset  v  (9) 

The  above  correction  parameters  are  obtained  by  measuring  the  discrepancy  be¬ 
tween  the  real  and  computer-model  manipulator  tip  position.  Orientation  of  the  de¬ 
tector  is  kept  unchanged.  The  path  becomes: 

Vpa,k  yPa,h  zJ=[p'-co^  ff-smqi  fenJp>- cosp,p- sin^)f  (10) 


3.5  Final  Path  Correction 


The  last  modification  of  the  path,  and  the  final  step  of  the  system-level  calibration, 
is  the  correction  along  the  z-axis.  Each  x-y  position  is  assigned  an  individual  verti¬ 
cal  correction  factor  that  is  obtained  by  linear  interpolation  between  values  of  an 
Overall  Calibration  Matrix  (OCM): 


^ path  ^ path  3 ^Correction  path  ’  T path  ) 


(11) 


The  OCM  is  obtained  by  mapping  a  relatively  flat  terrain,  generating  a  non- 
expanded  height  map,  and  then  manually  driving  the  sensor  to  touch  the  terrain  at 
various  spots  within  the  workspace.  At  each  spot,  the  necessary  correction  factor 
is  given  by  the  deviation  of  the  computer  model  of  the  sensor  from  the  terrain  con¬ 
tact.  Spots  are  then  used  to  generate  a  Delaunay  triangulated  surface  whose  height 
at  a  given  position  gives  the  amount  of  correction  needed.  The  OCM  is  directly 
extracted  from  that  surface  at  regular  grid  intervals  along  the  x-y  plane  in  the  ma¬ 
nipulator  frame. 


4.  Experiments  and  Results 

Several  experiments  were  conducted  with  Gryphon  to  assess  performance  of  the 
terrain  following  method  described  in  the  previous  sections.  Precise  quantification 
was  performed  by  replacing  the  detector  payload  (i.e.,  metal  detectors  and/or 
ground  penetrating  radar)  with  a  laser  rangefinder  (SICK  DME  2000,  SICK)  to 
track  the  scanning  gap  at  the  region  of  maximum  detector  sensitivity.  The  effects 
of  command  shaping,  the  implementation  of  which  on  Gryphon  is  detailed  in  [6], 
were  analyzed  by  attaching  a  3 -axis  accelerometer  to  the  tip  of  the  manipulator. 
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4.1  Filter  Performance 


Other  operating  conditions,  where  possible,  were  adjusted  to  match  those  of  a 
typical  payload  (e.g.,  weight  was  added  to  the  laser  rangefinder).  A  sandy  terrain 
under  natural  light  conditions  (i.e.,  a  fair  weather  day)  was  used  with  a  circular 
scanning  scheme  with  a  scanning  rate  of  100  mm/s  to  evaluate  the  filter  perform¬ 
ance,  and  the  effects  of  the  advanced  terrain  following. 

This  set  of  tests  used  a  10  cm  height  map  expansion  and  no  ATF.  The  map  fil¬ 
ter  was  varied  between  three  algorithms:  (a)  minimal  filtering  (a  control  case 
based  on  a  raw  depth  map  with  median  filtering),  (b)  Gaussian  Smoothing,  and  (c) 
Conditional  Planar  Filter  (CPF). 

Filter  performance  was  tested  on  two  different  terrain  profiles:  (i)  relatively  flat 
terrain  and  (ii)  the  same  terrain  with  a  rather  challenging  hump  (or  obstacle).  In 
both  cases,  the  scanning  procedure  comprised  several  passes. 

The  obstacle,  shown  in  Figure  6,  represents  extreme  slopes  and  contours  for 
expected  demining  conditions.  The  pass  selected  for  comparison  was  the  most 
challenging  and  includes  a  70°  curvature. 


Fig.  6  Scanning  pass  used  during  experiments. 


The  advantage  of  the  CPF  over  Gaussian  smoothing  is  visible  in  Figure  7.  The 
Gaussian  degrades  terrain  features,  which  results  in  a  loss/gain  of  obstacle  height. 


Fig.  7  Scanning  pass  over  obstacle,  filter  comparison. 
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4.2  Effects  of  Command  Shaping 

The  experiments  also  validate  the  use  of  robust  control  to  address  sensing  and  planning 
aberrations.  In  the  case  of  Gryphon,  an  unshaped  trajectory  is  first  calculated  by  gener¬ 
ating  a  trapezoidal  velocity  profile  in  joint  space.  The  trajectory  is  then  modified  by  a 
Zero  Vibration  and  Derivative  (ZVD)  shaper  (cf.  ref.  [11]).  The  sensor  loading  is  large 
enough  to  that  arm  compliance  is  visible.  In  particular,  measured  residual  vibration  at 
the  tip  of  the  arm  is  approximately  ±20  mm.  This  makes  it  difficult  for  a  feedback  con¬ 
trol  system  relying  on  the  joint  encoders  to  adequately  control  the  endpoint  vibration. 
With  command  shaping,  the  residual  vibration  is  approximately  ±2  mm. 

Smooth  trajectories  are  of  importance  for  best  metal  detector  sensitivity,  espe¬ 
cially  for  straight-line  rectangular  motion  as  this  requires  coordinated  motion  of 
the  arm  joints.  As  shown  in  Figure  8,  command  shaping  is  significantly  smoother. 


5.5  6  6.5  7  7.5  8  8.5  9 


Fig.  8  Endpoint  acceleration  at  the  end-effector  for  a  typical  scanning  pass  for  a  straight  line 
scanning  pass  of  2.4  m  with  a  maximum  speed  and  acceleration  of  0.8  m/s  and  2.66  m/s2. 


4.3  In-Field  Testing 

Gryphon  is  an  integrated,  weather-proof  system  built  to  be  robust  against  dust,  hu¬ 
midity  and  rain,  and  resistant  to  extended  temperature  ranges.  It  has  been  field  tested 
for  95  days  on  flat  ground,  bumpy  terrain,  and  slopes.  This  includes  operations  on 
various  test  mine  fields,  including  tests  conducted  under  the  supervision  of  the  Japan 
Science  and  Technology  Agency  (JST)  in  Japan  in  early  2005  [8]  and  early  2006  in 
Croatia  [9].  Two  Gryphon  robots  (each  with  a  different  detector  configuration)  also 
took  part  in  extensive  trials  in  Cambodia  (in  cooperation  with  by  the  Cambodian 
Mine  Action  Center).  Operated  by  Cambodian  deminers,  the  Gryphon  machines  per¬ 
formed  tests  for  more  than  150  hours  of  semi-autonomous  operation. 
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5.  Conclusion 

Field  operations  of  long-reach  manipulators  are  complicated  by  noise,  measure¬ 
ment  bias  and  vibration  that  lead  to  end-effector  positional  uncertainty.  This  dras¬ 
tically  reduces  the  number  of  field  applications  that  can  be  performed  accurately 
and  safely.  The  architecture  described  in  this  paper  combines  a  novel  filtering 
method,  a  decoupled  system-level  calibration  procedure  and  a  vibration  reduction 
technique  to  yield  an  effective  framework  for  obstacle  identification,  trajectory 
planning  and  generation.  Experimental  testing  on  Gryphon  shows  considerable 
deviation  reduction  when  applying  the  framework.  Combined  with  an  advanced 
terrain  following  technique,  it  effectively  avoids  collision  with  the  terrain  for  a 
successful  scanning  operation.  The  methods  demonstrated  improved  automated 
mine  detection  performance  and  tracking  on  the  Gryphon  robot. 
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Abstract  In  this  paper,  we  explore  the  use  of  synthesized  landmark  maps  for  ab¬ 
solute  localization  of  a  smart  wheelchair  system  outdoors.  In  this  paradigm,  three- 
dimensional  map  data  are  acquired  by  an  automobile  equipped  with  high  precision 
inertial/GPS  systems,  in  conjunction  with  light  detection  and  ranging  (LIDAR)  sys¬ 
tems,  whose  range  measurements  are  subsequently  registered  to  a  global  coordi¬ 
nate  frame.  The  resulting  map  data  are  then  synthesized  a  priori  to  identify  robust, 
salient  features  for  use  as  landmarks  in  localization.  By  leveraging  such  maps  with 
landmark  meta-data,  robots  possessing  far  lower  cost  sensor  suites  gain  many  of  the 
benefits  obtained  from  the  higher  fidelity  sensors,  but  without  the  cost.  We  show  that 
by  using  such  a  map-based  localization  approach,  a  smart  wheelchair  system  outfit¬ 
ted  only  with  a  2-D  LIDAR  and  encoders  was  able  to  maintain  accurate,  global  pose 
estimates  outdoors  over  almost  1  km  paths. 

1  Introduction 

We  are  interested  in  developing  smart  wheelchair  systems  capable  of  autonomous 
navigation  in  unstructured,  outdoor  environments.  Our  primary  work  to  date  in  this 
area  has  been  with  the  Automated  Transport  and  Retrieval  System  (ATRS)  [1]. 
ATRS  enables  independent  mobility  for  drivers  in  wheelchairs  by  automating  the 
stowing  and  retrieval  of  the  driver’s  wheelchair  system.  While  ATRS  has  been  com¬ 
mercialized,  and  its  smart-chair  system  does  in  fact  navigate  autonomously,  its  au¬ 
tonomy  is  limited  to  an  area  immediately  adjacent  to  the  host  vehicle.  We  would 
like  to  build  on  these  results  to  support  a  greater  range  of  smart-chair  applications. 
Key  to  this  objective  is  a  robust  means  for  outdoor  localization. 

Localization  is  a  fundamental  enabling  technology  for  mobile  robotics,  and  as  a 
result  a  very  active  research  area.  Although  the  problem  in  structured,  indoor  envi¬ 
ronments  might  be  considered  solved,  robust  localization  outdoors  is  still  an  open 
research  problem.  While  the  community  has  made  significant  strides  recently  in 
terms  of  vehicle  autonomy  outdoors  [2],  much  of  this  has  been  achieved  through 
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sensor  suites  using  tightly  coupled  inertial/GPS  navigation  systems  costing  up  to 
$100K  or  more.  Such  a  solution  is  impractical  in  terms  of  both  size  and  cost  for 
applications  such  as  ours.  In  the  absence  of  reliable  GPS  measurements,  fall-back 
strategies  are  similar  to  those  used  indoors  and  involve  extracting  strong  features 
from  RADAR,  LIDAR,  vision  sensors,  etc.,  and  tracking  their  relative  position  and 
uncertainty  estimates  over  time  [3,  4,  5].  However,  such  approaches  are  more  fragile 
when  used  outdoors  due  to  the  absence  of  continuous  structure  and  the  much  larger 
problem  scale. 

Urban  environments  represent  an  interesting  and  important  middle-ground  as 
over  80%  of  the  U.S.  population  resides  in  cities  and  suburbs  [6].  The  availability 
of  GPS  measurements  in  these  areas  for  pose  estimation  can  typically  be  assumed, 
but  multi-path  errors  from  buildings,  trees,  etc.,  can  significantly  compromise  its 
accuracy.  Fortunately,  these  same  structures  can  be  used  as  landmark  features  to 
yield  accurate  relative  position  estimates.  In  this  paper,  we  investigate  a  paradigm 
where  a  smart  wheelchair  system  relying  upon  lower  cost  sensors  localizes  with  the 
assistance  of  three-dimensional  maps  generated  by  a  vehicle  equipped  with  a  high 
fidelity  sensor  suite.  These  maps  are  synthesized  a  priori  to  identify  robust,  salient 
features  which  can  be  used  as  landmarks  for  robot  localization.  By  leveraging  these 
maps,  the  wheelchair  gains  many  benefits  obtained  with  the  higher-fidelity  sensors 
but  without  the  cost. 

2  Related  Work 

Our  work  relates  to  research  efforts  in  three-dimensional  mapping  as  well  as  robot 
localization  and  mapping.  Generating  three-dimensional  maps  of  urban  areas  has 
been  investigated  by  several  groups,  so  there  is  significant  previous  work  that  can 
be  leveraged  [7,  8,  9,  10,  11,  12].  Unlike  most  of  this  work,  our  focus  is  generating 
and  processing  three-dimensional  maps  with  respect  to  a  global  frame,  which  will  be 
reusable  and  readily  extended  by  any  user.  With  the  explosion  of  data  services,  we 
expect  the  availability  of  such  maps  to  be  commonplace  in  the  future  [13].  Our  mo¬ 
tivation  is  that  by  leveraging  such  maps,  lower  fidelity  sensors  could  be  employed. 
This  has  been  demonstrated  routinely  indoors  ( e.g .,  MCL  with  sonar  vs.  SLAM  with 
LIDAR),  and  we  believe  the  analogy  will  hold  outdoors.  Many  features  in  urban  en¬ 
vironments  are  viable  candidates  for  landmarks.  For  example,  corner  features  are 
often  used  in  EKF  localization  and  mapping  approaches  as  their  position  can  be 
reduced  to  a  single  point  [14].  Building  facades  and  walls  might  also  be  used  [8]. 
Indeed  even  signage  can  be  detected  and  recognized  [15].  While  we  are  ultimately 
interested  in  integrating  aspects  of  each  of  these  features  within  our  synthesized  map 
representation,  in  this  paper  we  limit  our  focus  to  pole  features  as  landmarks.  In  this 
context,  pole  features  would  correspond  to  street  lamps,  trees,  parking  meters,  street 
signs,  etc.  Such  features  are  prevalent  in  most  urban  areas. 

The  use  of  poles  features  as  landmarks  has  been  investigated  by  other  researchers. 
This  includes  the  work  of  [16,  17],  among  others.  The  primary  focus  of  these  efforts 
was  SLAM  with  a  ground  vehicle  (i.e.,  an  automobile)  where  “cylinder”  features 
were  segmented  using  vision  and/or  LIDAR  systems,  and  tracked  over  time.  This 
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technique  enabled  mobile  localization  and  mapping  in  outdoor,  unstructured  envi¬ 
ronments  over  relatively  long  distances  ( e.g .,  100s  of  meters).  We  propose  to  build 
upon  these  efforts  by  first  building  large-scale  three-dimensional  maps,  synthesizing 
these  maps  to  identify  strong  landmark  features,  introducing  a  refinement  stage  to 
improve  map  consistency,  and  then  leveraging  these  maps  with  an  ultimate  goal  of 
improving  localization  performance  outdoors. 


3  Map  Generation 


Data  Acquisition.  Our  vehicle  for  data  acquisition  was  “Little  Ben,”  which  previ¬ 
ously  had  served  as  the  Ben  Franklin  Racing  Team’s  entry  in  the  DARPA  Urban 
Challenge  [18].  Vehicle  pose  was  provided  by  an  Oxford  Technical  Solutions  RT- 
3050,  which  uses  a  Kalman  filter  based  algorithm  to  fuse  inertial  measurements, 
GPS  updates  with  differential  corrections,  and  odometry  information  from  the  host 
vehicle.  It  provides  6-DoF  pose  updates  at  100  Hz  with  a  stated  accuracy  of  0.5 
meters  circular  error  probable  (CEP).  Range  and  bearing  measurements  from  a  pair 
of  roof  mounted,  vertically  scanning  Sick  LMS291-S14  LIDAR  systems  were  then 
registered  to  the  current  vehicle  pose  to  generate  high-resolution  range  maps.  The 
two  LIDARs  are  highlighted  (circled  red)  in  Fig.  1  (Left).  We  used  two  LIDARs  to 
improve  map  reconstruction  by  reducing  scene  occlusion  and  for  redundant  mea¬ 
surements  to  reduce  noise  effects.  During  data  acquisition,  Ben  was  driven  at  8-12 
km/hr.  This  corresponded  to  a  LIDAR  scan  spacing  of  «  4-6  cm,  which  allowed 
even  thin  pole  features  (e.g.,  street  signs,  parking  meters)  to  be  captured  reliably. 


Fig.  1  Development  Platforms.  (Left)  Vehicle  used  for  data  acquisition.  Ben  integrates  an  OXTS 
RT-3050  and  a  pair  of  vertically  scanning  Sick  LMS291-S14  LIDARs  (circled  red).  (Center-Right) 
Our  smart-chair  platform  integrates  LIDAR,  vision,  GPS,  and  odometry  sensors.  Its  computer  in¬ 
terface  and  on-board  power  distribution  enable  a  range  of  sensors  and  accessories  to  be  quickly 
integrated  for  prototyping  purposes. 


LIDAR  Calibration.  Ultimately,  we  need  to  register  the  acquired  range  scans  to  a 
common  world  frame  W.  This  registration  requires  knowledge  of  the  extrinsic  pa¬ 
rameters  (rotation  R  and  translation  T)  of  both  the  vehicle  frame  V,  and  the  front 
and  back  LIDAR  frames  (F,B)  versus  time  with  respect  to  W .  The  vehicle  param¬ 
eters  Ry  (t),Ty  (t)  are  estimated  directly  by  the  OXTS  RT-3050  at  100  Hz.  As  the 
LIDARs  are  related  to  the  vehicle  frame  by  a  rigid  transformation,  we  need  only  re¬ 
cover  the  extrinsic  parameters  of  the  LIDARs  with  respect  to  the  vehicle  frame.  For 
this  work,  we  developed  a  novel  approach  to  simplify  the  calibration  process.  Noting 
that  points  in  the  world  frame  and  LIDAR  frames  were  related  by  rigid  transforma- 
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tions,  we  could  recover  the  LIDAR  extrinsic  parameters  with  a  sufficient  number  of 
point  correspondences  between  the  front  and  back  LIDARs.  To  facilitate  correspon¬ 
dence  tracking  over  time,  6  poles  on  a  «  50  x  100  meter  calibration  loop  were  used 
as  landmarks.  On  each  pole,  two  retro-reflective  targets  were  placed  1  meter  apart 
for  a  total  of  12  target  points  XT  =  [x\,yi,zu- . ,  ,xi2,yi2,zn]T  in  W.  These  targets 
were  automatically  segmented  from  the  environment  by  thresholding  the  LIDAR 
remission  measurements.  A  sample  landmark  pole  is  shown  in  Fig.  2  (left). 

Point  correspondences  were  obtained  by  driving  multiple  cycles  around  our  cal¬ 
ibration  loop.  Our  calibration  process  then  consisted  of  two  stages.  The  first  was 
to  remove  deterministic  error  between  successive  calibration  loops  caused  by  GPS 
jumps.  This  step  was  accomplished  by  using  the  first  loop  to  generate  reference 
landmark  positions  X\  =  [^i,i,yi,i,zi,i ,  • . .  ,^i2,iDi2,i,Zi2,i]r,  where  vy  denotes  the 
estimated  x-position  of  the  Ith  target  in  W  during  the  jth  calibration  loop.  The  deter¬ 
ministic  shift  Sj  =  [sxj,syj,szj]T  of  the  jth  loop  was  estimated  by 

S*j  --  argniin  Xj  n.Sj  nil"  (1) 

where  SjX  12  G  M36  is  the  estimated  deterministic  error  Sj  replicated  for  each  target 
point.  This  minimization  problem  was  solved  using  a  least-squares  approach.  Sj  was 
then  treated  as  a  bias,  and  the  value  of  Xj  for  each  loop  was  adjusted  accordingly. 

The  second  stage  was  to  remove  the  influences  of  random  error  in  the  calibration 
process.  In  doing  so,  we  needed  to  estimate  the  extrinsic  parameters  for  each  LIDAR 
,Rjr,Tp),  as  well  as  the  positions  of  our  12  targets  (Xj)  in  W.  We  note  that 
for  the  same  world  point  x^, 

Rvf  (Rf*f  +  7>V )  +  T&  =  R^b  (Rvbxb  +  Tb)  +  T™  (2) 

where  the  VF  and  VB  subscripts  are  used  to  denote  the  vehicle  transformation  to  the 
world  frame  corresponding  to  the  different  vehicle  poses  when  xw  was  observed  by 
the  front  and  back  LIDARs,  respectively.  Thus,  we  can  solve  for  both  the  LIDAR  ex- 
trinsics  as  well  as  the  target  positions  with  a  minimum  of  16  point  correspondences 
between  the  front  and  back  LIDARs.  Since  the  vehicle  pose  varies  with  each  cali¬ 
bration  loop,  12  unique  correspondences  can  be  obtained  from  each  loop  cycle.  As  a 
result,  a  large  number  of  correspondences  can  be  acquired  very  quickly.  We  solved 
for  (2)  using  a  non-linear  minimization  solver.  However,  one  final  enhancement  was 
added  first  to  remove  measurement  outliers.  For  this,  we  employed  a  “constrained” 
RANSAC  approach  [19],  where  we  instantiated  each  model  hypothesis  with  a  small 
number  (2-4)  of  correspondences  at  random  from  each  target.  This  enhancement 
ensured  that  the  error  residuals  were  balanced  across  the  entire  calibration  loop. 

Fig.  2  shows  representative  results  of  merging  front  and  back  LIDAR  data  using 
the  measured  extrinsics  (center)  and  those  estimated  from  the  calibration  process 
(right).  Improvements  in  data  fusion  and  scene  reconstruction  from  the  calibration 
process  are  clearly  visible.  The  mean  absolute  error  (MAE)  for  the  error  residuals 
between  the  two  LIDAR  reprojections  was  12.67  cm.  As  the  performance  of  the 
pose  system  is  50  cm  CEP,  these  results  were  considered  satisfactory. 
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Fig.  2  (Left)  One  of  six  landmark  poles  used  during  the  LIDAR  calibration  process.  The  retro- 
reflective  targets  could  be  automatically  segmented  to  track  correspondences  across  multiple  cali¬ 
bration  loops.  Reconstruction  results  before  and  after  our  calibration  phase.  Improvements  in  data 
fusion  with  the  front  (red)  and  back  (blue)  LIDARs  from  the  calibration  phase  are  clearly  visible. 

Landmark  Synthesis.  Segmenting  pole  features  was  accomplished  using  a  two- 
step  clustering  approach:  (1)  recursively  cluster  points  within  each  scan,  and  (2) 
merge  clusters  in  successive  scans  where  appropriate.  In  both  steps,  a  Euclidean 
distance  threshold  was  used  as  the  clustering  criterion.  For  a  cluster  to  be  accepted 
as  a  pole  feature,  validation  gates  were  placed  on  cluster  size.  Furthermore,  only 
strongly  vertical  clusters  were  accepted  by  examining  the  covariance  matrix  C  of  the 
associated  feature  points’  positions.  Specifically,  the  eigenvector  associated  with  the 
largest  eigenvalue  Xmax  of  C  should  be  close  to  [0,0,  l]r.  Only  after  clearing  these 
validation  gates  was  the  cluster  accepted  as  a  landmark  in  the  synthesized  map. 

4  Wheelchair  Localization 

The  smart-chair  used  in  this  work  is  based  upon  an  Invacare  M91  Pronto  power 
wheelchair  with  Mk5  electronics.  It  integrates  high  resolution  optical  encoders,  a 
Hokuyo  UTM-30LX  LIDAR  system,  a  1024x768  Point  Grey  digital  video  camera, 
and  a  Garmin  18  WAAS  enabled  GPS  system.  For  this  work,  the  UTM-30LX  was 
the  wheelchair’s  sole  exteroceptive  sensor.  When  compared  to  the  ubiquitous  Sick 
LMS2xx  LIDARs,  it  is  extremely  compact.  In  our  current  integration,  the  LIDAR 
and  camera  system  are  mounted  on  the  opposite  arm  as  the  manual  joystick  con¬ 
troller  as  shown  in  Fig.  1  (center).  The  configuration  is  comparable  in  size  to  the 
joystick  controller  box. 

Landmark  Detection.  In  our  localization  paradigm,  the  wheelchair  employs  LI¬ 
DAR  and  odometry  sensors  in  conjunction  with  the  synthesized  landmark  map. 
Implicit  in  this  approach  is  the  assumption  that  the  landmarks  can  be  reliably  seg¬ 
mented.  However,  unlike  the  landmark  synthesis  phase,  the  wheelchair  LIDAR  must 
rely  entirely  upon  two-dimensional  scan  data.  To  compensate  for  this,  our  land¬ 
mark  detection  strategy  used  two  approaches  dependent  upon  pole  feature  geom¬ 
etry.  The  first  step  in  either  approach  was  clustering  registered  point  returns  from 
the  wheelchair  LIDAR  scan  in  Euclidean  space.  Cluster  diameters  were  then  used 
to  discriminate  between  larger  diameter  pole  features  (trees,  telephone  poles,  street 
lamps,  etc.)  and  narrower  ones  (parking  meters,  traffic  sign  posts,  etc.). 

Larger  diameter  clusters  with  5  or  more  points  were  fit  as  circle  features.  Several 
additional  validation  gates  followed  based  upon  circle  geometry  and  residual  fitting 
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error  before  a  feature  was  accepted  as  a  landmark  candidate.  We  found  empirically 
that  larger  diameter  landmarks  could  be  reliably  detected  at  ranges  ^  75  x  the  fea¬ 
ture  radius,  meaning  that  a  landmark  with  a  radius  of  10  cm  would  typically  be  de¬ 
tected  at  a  range  of  about  7.5  meters.  Circle  fitting  was  not  appropriate  for  smaller 
diameter  clusters.  As  such,  these  features  were  tracked  over  time.  If  they  were  per¬ 
sistent  and  no  other  clusters  were  detected  within  a  given  distance  threshold,  they 
were  accepted  as  landmark  candidates.  Using  such  an  approach,  smaller  diameter 
features  could  reliably  be  detected  at  ranges  <  5  meters.  Candidate  landmarks  were 
passed  to  the  data  association  module  for  additional  processing. 

Data  Association.  There  are  inherent  limitations  in  using  two-dimensional  LIDAR 
measurements  to  segment  three-dimensional  landmarks.  As  a  result,  the  landmark 
detection  process  may  erroneously  validate  other  environmental  features  as  pole  fea¬ 
tures.  The  impact  of  these  false  positives  on  localization  performance  was  mitigated 
through  a  data  association  phase. 

Several  sources  of  uncertainty  exist  in  the  synthesized  landmark  locations  within 
our  map.  These  sources  include  uncertainty  introduced  by  the  pose  system,  errors 
in  LIDAR  extrinsic  calibration,  LIDAR  range  errors,  and  noise  associated  with  the 
landmark  synthesis  process  itself  to  name  but  a  few.  Uncertainty  in  landmark  posi¬ 
tion  was  modeled  by  associating  a  covariance  matrix  Zi  with  the  position  of  each 
landmark  while  Zw  and  ZQ  denoted  covariance  matrices  associated  with  the  uncer¬ 
tainty  in  wheelchair  pose  and  LIDAR  range  and  bearing  observations,  respectively. 
With  these  so  defined,  we  used  the  Mahalanobis  distance  D  between  the  predicted 
and  observed  sensor  measurements  as  our  quality  metric  for  data  association,  de¬ 
fined  as 

D  =  Jz  r(£0  +H^HZ  +/Wf;r)-1  z  (3) 

where,  Hw  and  Hj  are  the  Jacobians  of  the  observation  model  with  respect  to 
wheelchair  pose  and  landmark  location,  respectively, 
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and  z  =  z/  —  zQ  is  the  difference  between  the  predicted  and  actual  range  and  bearing 
measurements  for  the  wheelchair  LIDAR.  A  threshold  on  D  served  to  filter  out  po¬ 
tential  false  positives  observed  during  the  landmark  detection  phase.  For  the  case  of 
closely  located  landmarks  where  multiple  possible  detection-landmark  associations 
might  be  possible,  the  association  minimizing  the  total  assignment  cost  was  used. 

Localization  Approach.  Extended  Kalman  Filters  (EKFs)  have  been  one  of  the 
most  popular  techniques  for  state  estimation  in  mobile  robotics  [3,  20,  14],  and 
we  took  a  similar  approach  for  estimating  the  wheelchair  pose  x  =  [xw,yw,0w]T . 
In  the  prediction  step,  linear  and  angular  velocities  (v,  c 0 )  were  estimated  from  the 
encoders  using  a  differential  drive  model  for  the  wheelchair.  For  the  correction  step, 
the  observation  functions  were  based  upon  LIDAR  estimates  for  the  range  zr  and 
bearing  Za  to  the  segmented  landmark  at  position  (jt/,y/) 
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zr  =  J £  -  ^)2  +  bw  -  y/)2,  =  arctan  ( — ]  —  Qw  (5) 

and  the  Kalman  gain  was  calculated  as  K  =  PHwr  (HWPHWT  +  HiEiHi7  +Z0)_1 
where  and  77/  were  as  defined  in  (4).  The  process  then  followed  a  traditional 
EKF  implementation  with  updates  of  2-5  Hz  dependent  upon  vehicle  velocity. 
Landmark  Position  Refinement.  A  shortcoming  with  relying  heavily  upon  GPS 
for  map  generation  is  that  changes  in  satellite  geometry/visibility  can  lead  to 
“jumps”  in  vehicle  pose.  These  discontinuities  affect  map  consistency.  One  ap¬ 
proach  to  address  this  would  be  to  integrate  additional  sensing  onto  the  data  ac¬ 
quisition  platform  and  run  SLAM  in  parallel  with  the  data  acquisition  phase  [12]. 
We  propose  an  alternate  refinement  stage  where  SLAM  is  actually  run  by  the  map 
client  -  in  our  case  the  wheelchair  -  during  an  initial  route  traversal  akin  to  a  learn¬ 
ing  phase.  This  is  something  we  envision  would  be  done  by  the  wheelchair  user’s 
care  provider  prior  to  enabling  completely  autonomous  operations.  An  advantage  of 
this  approach  is  that  the  landmark  refinement  would  be  tuned  to  the  actual  sensor  ge¬ 
ometries  employed  by  the  client  vehicle.  For  our  implementation,  we  extended  our 
EKF  localization  using  a  SLAM  approach  as  in  [3,  20]  to  further  refine  the  landmark 
positions.  The  landmark  locations  were  then  updated  with  the  SLAM-refined  land¬ 
mark  positions  and  covariance  El  estimates.  While  this  did  not  improve  the  global 
map  accuracy,  it  significantly  improved  the  map  consistency. 

5  Experimental  Results 

To  investigate  the  viability  of  the  proposed  approach,  our  first  experiments  were 
conducted  in  the  parking  lots  around  Lehigh’s  Stabler  Arena.  Admittedly,  this  area 
was  not  representative  of  urban  environments.  However,  it  served  as  a  low-traffic 
proving  ground  with  sufficient  pole  features  to  first  validate  the  concept.  Fig.  3  (left) 
shows  the  raw  registered  range  data  acquired  by  driving  Ben  through  the  area.  These 
data  were  then  synthesized  as  outlined  above,  and  the  resulting  map  with  embedded 
landmarks  is  shown  at  Fig.  3  (right).  Validating  the  fidelity  of  this  reconstruction  is 
difficult  due  to  the  lack  of  absolute  ground  truth.  However,  we  measured  the  dis¬ 
tance  between  25  pairs  of  landmarks  using  a  Bosch  DLE50  laser  distance  measure 
and  compared  these  to  the  distances  of  corresponding  synthesized  landmark  pairs. 
The  mean  absolute  difference  between  the  sets  was  7.2  cm.  We  also  reviewed  the 
reliability  of  the  landmark  synthesis  approach.  All  71  pole  features  present  in  the 
area  surveyed  were  positively  detected  and  synthesized  into  the  map. 


Fig.  3  Registered  raw  (left)  and  synthesized  map  data  (right).  The  relative  distance  differences 
between  synthesized  landmark  pairs  and  their  real-world  counterparts  was  about  7  cm. 
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Using  the  original  synthesized  map  and  landmark  positions,  the  wheelchair  was 
manually  driven  over  a  route  network  960  meters  in  length  at  a  fast  walking  speed 
(1.6  m/s).  This  first  loop  constituted  the  landmark  refinement  stage  discussed  in 
Section  4,  and  the  wheelchair  localized  using  an  EKF  SLAM  approach  with  the 
segmented  landmark  positions.  Using  SLAM,  the  wheelchair  was  able  to  accurately 
maintain  its  pose  over  the  entire  960  meter  loop.  We  then  repeated  this  same  ex¬ 
periment  3  separate  times  using  map-based  localization  with  the  updated  landmark 
position  and  uncertainty  estimates.  All  other  parameters  for  data  association  and  lo¬ 
calization  remained  fixed.  Representative  results  are  at  Lig.  4  (left).  The  landmark 
positions  are  denoted  by  red  circles.  The  wheelchair  path  as  estimated  by  the  map- 
based  localization  approach  is  denoted  by  the  blue  line.  The  path  as  estimated  by 
the  wheelchair’s  own  GPS  is  also  shown  for  comparison  purposes  (green  line).  The 
initial  pose  estimate  of  the  GPS  was  also  used  to  initialize  the  pose  for  map  local¬ 
ization.  Using  the  SLAM-refined  landmark  positions,  all  3  trials  were  successfully 
completed.  To  characterize  the  localization  accuracy,  the  wheelchair  was  driven  over 
6  ground-truth  points  (shown  as  “+”)  whose  positions  relative  to  landmarks  were 
measured  by  hand.  The  average  position  errors  was  20  cm,  with  3  <  la,  5  <  2a, 
and  all  6  <  3a  based  upon  the  covariance  estimates  for  Zw  and  Z/. 


Fig.  4  Localization  results  using  refined  (left)  and  original  landmark  position  estimates  (right). 
Improving  the  consistency  of  landmark  positions  dramatically  improved  localization  performance. 


To  motivate  the  need  for  the  landmark  refinement  phase,  we  also  ran  these  same 
trials  using  map-based  localization  (not  SLAM)  with  the  original  landmark  posi¬ 
tions.  Each  of  these  trials  ended  in  failure.  This  typically  occurred  at  a  portion  of 
the  course  where  the  inter-landmark  spacing  required  the  wheelchair  to  rely  upon 
dead  reckoning  for  over  20  meters  of  travel.  Significant  error  and  uncertainty  in 
wheelchair  pose  accumulated  during  this  time,  resulting  in  an  incorrect  feature  as¬ 
sociation.  This  is  shown  in  Lig.  4  (right).  However,  the  open-loop  travel  was  not 
the  sole  culprit.  Lrom  a  subsequent  analysis,  we  determined  that  a  fairly  significant 
GPS  shift  occurred  during  the  data  acquisition  phase  for  building  the  map.  As  a  re¬ 
sult,  a  fraction  of  the  map  exhibited  a  shift  >  1  meter  with  respect  to  the  maps  initial 
coordinate  frame.  This  shift  significantly  contributed  to  the  data  association  failure, 
and  the  correct  robot  pose  could  not  be  recovered  by  the  EKP.  As  the  SLAM  algo¬ 
rithm  updated  the  landmark  positions  on  the  fly,  it  was  robust  to  this  shift  error.  The 
subsequent  landmark  refinement  stage  mitigates  the  impact  of  GPS  jump. 
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Our  final  experiment  involved  a  similar  test  in  South  Bethlehem,  PA  adjacent  to 
Lehigh’s  campus.  This  was  a  representative  urban  environment,  with  a  significantly 
higher  density  of  landmarks  than  seen  during  the  Stabler  testing.  During  this  test, 
the  wheelchair  was  manually  driven  approximately  720  meters  along  the  sidewalk  at 
a  velocity  of  ~  1  m/s.  Results  from  this  trial  using  SLAM  are  shown  at  Fig.  5,  where 
the  landmarks  positions  (red  circles),  the  SLAM  estimated  path  (blue  line)  and  GPS 
path  (green  line)  are  superimposed  on  a  satellite  image.  While  completing  the  loop 
was  prevented  by  ongoing  building  construction,  the  end  position  was  consistent 
with  localization  estimates.  Again,  results  compared  favorably  to  the  wheelchair’s 
WA AS -corrected  GPS  estimate.  We  have  not  been  able  to  repeat  this  trial  using 
localization  with  the  SL AM-enhanced  map  due  to  seasonal  weather  conditions,  but 
expect  to  in  the  near  future. 


Fig.  5  Map-based  localization  of  the  wheelchair  (blue  line)  vs.  GPS  position  estimates  (green  line). 

6  Discussion 

In  this  paper,  we  investigated  the  acquisition,  synthesis  and  application  of  three- 
dimensional  maps  by  a  smart  wheelchair  for  map-based  localization.  Since  the  maps 
were  registered  to  a  global  frame,  they  provide  a  means  for  absolute  position  estima¬ 
tion  in  urban  areas  even  in  the  absence  of  GPS.  In  our  experiments,  our  wheelchair 
system  was  able  to  maintain  accurate  pose  estimates  after  traveling  hundreds  of  me¬ 
ters  using  such  an  approach.  While  we  are  satisfied  with  the  results  to  date,  we  do 
realize  this  is  just  a  first  step.  Pole  features  were  an  obvious  first  choice  for  land¬ 
marks,  and  we  are  now  beginning  to  synthesize  additional  features  into  the  map 
(. e.g .,  building  corners).  We  are  also  interested  in  vision  based  signage  detection,  as 
these  can  provide  nearly-unique  IDs  for  inferring  global  position.  We  also  assume 
the  ability  to  automatically  segment  pedestrians  from  the  environment.  Our  current 
implementation  fuses  results  from  vision  and  LIDAR  systems.  The  camera  uses  the 
Haar-like  feature  based  classifier  for  face  detection  from  OpenCV  [21],  while  the 
LIDAR  segments  candidate  clusters  based  upon  geometry  constraints.  Individually, 
both  systems  have  high  rates  of  false  positives.  However,  this  can  be  reduced  dra- 
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matically  by  only  accepting  tracks  when  both  sensors  report  a  detection.  A  downside 
is  that  significant  false-negatives  remain.  We  are  continuing  to  refine  this  approach. 
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Tethered  Detachable  Hook  for  the  Spiderman 
Locomotion 

(Design  of  the  Hook  and  its  Launching  Winch) 


Nobukazu  ASANO,  Hideichi  NAKAMOTO,  Tetsuo  HAGIWARA,  Shigeo 
HIROSE 


Abstract  This  paper  introduces  a  new  concept  of  ’’tethered  detachable  hook  (TDH)” 
and  its  launching  winch.  TDH  system  is  the  device  which  will  be  mounted  on  a  mo¬ 
bile  robot  and  enhances  its  travers ability  over  extremely  hostile  terrain  by  launching 
detachable  hook  to  nearby  objects,  producing  large  traction  force  by  the  tether  and 
detaching/recovering  the  hook  to  the  launcher  again.  In  this  paper  the  authors  first  of 
all  introduce  several  prototype  models  of  the  TDH.  We  then  discuss  the  design  of  lat¬ 
est  model  which  features  pneumatic  detaching  mechanism,  the  pneumatic  launcher 
and  the  reel  mechanism  having  three  motion  states;  active  rotation,  free  rotation  and 
braking.  Finally,  the  result  of  several  experiments  of  constructed  TDH  model  will 
be  explained. 


1  Introduction 

Many  types  of  mobile  robots  have  been  developed  so  far  to  move  on  off-the-road 
terrains,  such  as  modified  wheel,  track,  legs,  and  snake-like  configuration.  Even 
jumping  can  be  considered  as  one  of  the  means  for  high  mobility  [1].  However, 
if  long  and  steep  slope  or  ditch  much  wider  than  the  size  of  the  robots  is  on  the 
way,  terrain  adaptability  of  these  conventional  methods  is  not  enough.  In  this  paper, 
we  propose  a  new  type  of  locomotion  method  which  assists  the  mobility  of  these 
mobile  robots.  It  consists  of  ’’tethered  detachable  hook”  and  its  launcher  and  winch 
system,  which  assists  the  mobile  robot  as  shown  in  Fig.  1(a).  Here  the  mobile  robot 
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is  going  to  climb  the  steep  slope  and  the  ’’tethered  detachable  hook”  is  launched  to 
the  branch  of  a  tree.  When  the  hook  is  connected  to  the  branch,  the  winch  winds  the 
tether  and  produces  large  traction  force  to  assist  the  robot  to  go  over  the  steep  slope. 
After  the  motion  is  over,  the  hook  is  detached  from  the  branch  by  changing  the  shape 
of  the  hook  to  smooth  and  linear  shape,  rewinding  the  winch,  and  finally  restoring 
the  hook  in  the  launcher  again  to  prepare  for  the  next  launching  task.  If  the  mobile 
robot  has  more  than  two  tethered  detachable  hooks  and  their  launcher- winch  system, 
it  can  even  lift  itself  from  the  ground  and  move  from  branch  to  branch  as  shown  in 
Fig.  1(b).  This  is  like  the  motion  of  long-armed  ape  in  forest  or  the  spiderman  flying 
from  building  to  building. 

Until  now  several  tethered  robots  were  already  proposed,  such  as  TITAN  VII  [2] 
or  DANTE  II  [3].  They  are  supported  by  tethers  which  are  anchored  beforehand 
at  the  top  of  the  slopes.  Cliffbot  [4]  is  supported  by  an  anchor  robot  which  stays 
at  the  top  of  the  cliff  and  connected  by  the  tether.  Casting  manipulator  [5]  has  the 
tether  with  a  gripper  at  the  end  and  casted  to  catch  an  object.  Although  the  objective 
of  this  casting  manipulator  was  for  the  manipulation  of  a  remotely  located  object, 
the  concept  can  easily  be  extended  to  the  supporting  system  for  mobile  robot.  The 
automated  tether  management  system  [6]  is  most  closely  related  to  our  concept.  It 
used  a  tether  with  a  gripper  which  can  be  remotely  operated  to  lock  or  detach  it 
and  help  the  flying  motion  of  a  space  robot.  But  as  it  is  designed  for  the  activity  in 
micro-gravity  environment,  the  system  can  not  directly  apply  for  the  application  of 
field  robotics  which  we  are  targeting  in  this  paper. 

This  paper  is  organized  as  follows.  Section  2  describes  the  design  of  former  mod¬ 
els  of  TDH.  Section  3  presents  design  of  the  latest  model  of  TDH  and  its  launching 
winch.  Section  4  reports  experimental  results  of  the  constructed  TDH  and  its  launch¬ 
ing  winch.  Section  5  concludes  the  results  and  proposes  future  works. 


2  Former  Models  of  Tethered  Detachable  Hook 
2.1  Model  I 

As  the  first  model  of  the  ’’tethered  detachable  hook”,  the  authors  developed  the 
Model  I  as  shown  in  Fig. 2(a).  Although  we  call  it  as  ’’tethered  detachable  hook”,  we 


(a)  Slope  climbing  by  using  TDH  (b)  Ditch  crossing  by  using  a  pair  of  TDHs 


Fig.  1  Concept  of  tethered  locomotion  by  using  tethered  detachable  hook(TDH) 
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did  not  selected  the  hook  but  gripper.  As  shown  in  Fig. 2(a)  and  (b),  the  gripper  is 
designed  to  grip  an  object  when  the  tip  end  of  the  rod  contact  the  object  and  hold 
its  gripping  state  by  the  ratchet  mechanism.  Ability  to  hold  the  object  tight  was  the 
main  reason  why  we  selected  gripper  configuration  for  this  first  model.  Release  of 
the  gripping  motion  is  designed  to  be  done  by  a  mechanical  memory  system.  The 
mechanical  memory  system  is  already  used  in  the  ballpoint  pen  with  different  colors. 
It  consists  of  a  cylindrical  cam  with  zigzag  grooves  (Fig. 2(c))  in  which  the  pin  fixed 
to  the  external  cylinder  is  inserted.  When  the  tension  of  the  tether  connected  to  the 
cylindrical  cam  changes  and  drives  the  cam  to  make  reciprocating  motion,  the  cam 
is  driven  by  the  pin  and  starts  to  rotate  in  one  direction.  In  the  three  zigzag  grooves, 
one  of  the  grooves  is  made  longer,  so  the  ratchet  release  rod  is  inserted  in  the  ratchet 
trigger  to  release  the  ratchet  and  open  the  finger  every  three  times  of  the  pulling 
motion  of  the  tether. 

For  this  Model  I  we  also  made  simple  launcher  and  made  the  experiment  to  cast 
it  to  the  branch.  Once  it  is  gripped  the  branch,  it  showed  strong  connection  and 
release  motion  was  also  very  smooth.  But  the  problem  of  this  first  model  was  its 
difficulty  of  aiming  at  the  target  object  (branch).  The  gripper  should  be  aimed  at  the 
object  precisely  in  position  and  also  in  orientation;  otherwise  the  gripper  could  not 
hold  the  object  successfully.  This  is  the  big  problem  if  we  hope  to  make  automatic 
launching  system.  Another  problem  of  this  first  model  was  the  shape  of  the  gripper 
in  open  state.  It  is  not  streamlined  and  there  is  always  the  danger  to  be  stacked  in 
narrow  gap. 


2.2  Modelll 

To  solve  the  difficulty  of  precise  aiming  of  the  target,  we  selected  hook  for  the 
following  models.  The  Model  II  of  the  tethered  detachable  hook  is  shown  in  Fig. 3. 
As  shown  in  Fig. 3,  the  Model  II  also  adopted  the  mechanical  memory  system,  and  it 
could  be  released  by  pulling  the  tether  for  few  times.  Repeated  traction  of  the  tether 


Release  points 
(b)  Gripping  motion  (c)  Zigzag  cam  motion 


Fig.  2  Mechanism  of  the  TDH  model  I 
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rotate  the  cylindrical  cam  and  it  drives  the  lock  lever,  and  release  the  stopper  to  open 
the  claws. 

The  Model  II  has  four  claws  attached  radially  at  the  end  of  the  hook  so  that  the 
hook  can  be  anchored  to  the  target  object  much  easier  than  the  gripper.  It  can  be 
hooked  the  target  object  only  by  throwing  it  over  the  target.  The  Model  II  was  much 
easier  to  connect  to  the  target  objects  than  former  model,  however  it  still  remained 
several  problems.  One  of  them  is  its  weight.  As  there  are  four  claws,  it  is  heavy  and 
powerful  launcher  is  needed.  The  second  problem  is  the  shape  of  the  hook  in  release 
state.  Although  the  shape  is  more  streamlined  than  that  of  Model  I,  the  shape  is  still 
in  wedge  like  and  there  remained  the  possibility  to  be  stack  in  narrow  gap  while  it  is 
recovering  to  the  launcher.  The  third  problem  is  the  possibility  of  mal-operation  of 
the  mechanical  memory  system,  for  the  tether  will  always  be  affected  by  accidental 
pulling  and  releasing  motion  and  it  may  be  released  by  chance. 


2.3  Model  III 

To  solve  the  problems  mentioned  above,  we  developed  Model  III.  The  model  III  has 
three  important  modifications  as  follows; 

1 .  reduce  the  number  of  claws  from  four  to  one 

2.  change  the  shape  of  the  hook  as  a  simple  rod  in  the  release  state 

3.  introduce  active  detaching  mechanism 

Modification  1  is  done  to  reduce  the  weight  of  the  hook.  We  selected  four  claws 
configuration  for  the  Model  II  to  secure  reliable  anchor  action  in  any  posture  of 
the  hook.  However,  we  found  that  even  one  claw  hook  can  exhibit  similar  action 
only  by  adding  enough  length  and  weight  to  the  claw.  Effect  of  this  configuration 
is  observed  in  the  experiment  of  Fig.  10.  In  this  experiment,  when  the  hook  pulled 
slowly  over  a  branch  (in  this  case  a  pipe),  the  hook  will  rotate  around  its  stem  and 
let  the  claw  lower  on  the  branch  as  the  claw  is  heavy,  and  thus  the  claw  grips  the 
branch. 

Modification  2  is  done  to  minimize  the  stack  action  while  the  hook  is  in  retract¬ 
ing  state.  As  shown  in  Fig.4,  the  hook  is  designed  to  change  from  F-shaped  state 
to  linear- shaped  state.  Difficulty  of  realizing  this  shape  was  in  the  joint  design  of 


Fig.  3  Mechanism  of  the  TDH  model  II 
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the  claw.  As  large  torque  is  applied  at  the  joint  to  support  large  traction  force  of 
the  tether,  the  joint  mechanism  has  to  produce  large  torque  and  the  joint  tends  to  be 
bulky  and  heavy.  To  solve  this  problem,  we  introduced  tether  supported  joint  mech¬ 
anism.  In  this  mechanism,  the  end  of  the  tether  is  connected  to  the  claw  and  joint 
torque  is  directly  supported  by  the  traction  of  tether  as  shown  in  Fig.4.  The  tether  is 
designed  to  go  out  of  the  joint  to  produce  large  torque  in  this  hooked  state.  In  the 
linear-shaped  state,  to  the  contrary,  the  tether  retracts  inside  the  joint  and  the  joint  is 
made  slender. 

Modification  3  is  done  to  eliminate  the  expected  malfunction  of  the  release  mo¬ 
tion  depending  upon  the  accidental  pulling  action  of  the  tether.  The  mechanical 
memory  system  was  ideal  because  normal  rope  can  be  used  as  the  tether  and  lock- 
and-release  mechanism  of  the  hook  in  Model  II,  or  gripper  in  Model  I  could  be 
made  comparatively  simple.  To  make  comparative  system  with  high  reliability,  we 
considered  electric  and  pneumatic  types  of  trigger  driving  mechanisms. 

Design  of  the  electric  detachable  mechanism  is  shown  in  Fig. 5.  A  stopper  is 
connected  to  the  tether  and  the  stopper  is  locked  by  a  trigger  that  is  fixed  by  the 
rod  (Phase  1).  In  this  state,  the  hook  holds  L  shaped  configuration  and  act  as  an 
anchor.  Release  motion  of  the  hook  is  done  by  rotating  the  screw  by  the  small  motor 
and  slides  the  rod  fixing  the  rotation  of  the  trigger.  This  motion  frees  the  stopper 
and  the  tether  automatically  slides  to  open  the  claw  in  a  release  state  by  the  spring 
attached  around  the  joint  (Phase  2).  Required  electric  current  is  very  small  and  it 
can  be  supplied  by  small  diameter  electric  wire  inside  the  tether.  As  the  tether  has 
to  support  large  traction  force,  we  used  a  Kevlar®  fiber-reinforced  wire  together 
with  the  fine  electric  wires.  The  authors  have  already  used  it  as  the  ’’Hyper  Tether” 
system  [7]  and  Anchor  climber  [8]. 


Fig.4  Basic  structure  of  the  TDH  model  III 


Fig.  5  Electric  detach  type  of  the  TDH  model  III 
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A  pneumatic  detachable  mechanism  is  shown  in  Fig. 6.  The  tether  consists  of  air 
tube  and  wire,  and  the  end  of  the  tube  is  connected  to  a  small  air  bag  which  is  located 
inside  a  stopper.  The  stopper  can  slide  inside  a  pipe  fixed  to  the  hook  and  the  stopper 
is  locked  by  the  trigger  as  shown  in  Fig. 6  (Phase  1).  The  wire  inside  the  air  tube  is 
connected  to  the  stopper  and  the  end  of  the  wire  is  connected  to  the  claw.  Detaching 
motion  of  the  hook  can  be  done  by  supplying  pressurized  air  in  the  air  tube  and 
expand  the  airbag.  Expanded  airbag  pushes  out  the  projection  point  of  the  bag  and 
drive  the  trigger  out  of  the  hole  and  release  the  stopper  (Phase  2).  It  enables  the 
stopper  and  the  wire  connected  to  slide  freely  and  open  the  claw.  Although  the  gap 
between  air  tube  and  wire  (polyethilene  line)  is  small,  we  found  that  the  pressurized 
air  could  easily  be  transmitted  along  the  air  tube  longer  than  10[m]  with  ease. 


Table  1  Specifications  of  the  TDH  model  II  &  III 


Type 

Mass 

Total  length 

Length  of  claw 

Model  II 

840  [g] 

435  [mm] 

85  [mm] 

Model  III  (Electric) 

395  [g] 

325  [mm] 

150  [mm] 

Model  III  (Pneumatic) 

390  [g] 

345  [mm] 

150  [mm] 

We  successfully  made  both  types  of  detachable  mechanisms  and  verified  their 
motions.  Specifications  of  these  types  are  shown  in  Table  1  with  those  of  Model  II. 
Between  these  types,  we  selected  the  pneumatic  type,  because  tether  of  the  pneu¬ 
matic  type  can  be  lighter  and  driving  mechanism  of  the  hook  can  be  lighter  and 
rugged  enough  to  be  protected  against  the  shock. 


3  Design  of  Launching  Winch  for  the  Tethered  Detachable  Hook 

A  casting  device  developed  for  TDH  Model  III  is  shown  in  Fig. 7.  It  consists  of  a 
launcher  for  the  hook  and  a  winch  to  wind  the  tether. 
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Fig.  6  Pneumatic  detach  type  of  the  TDH  model  III 
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First  we  describe  a  launcher.  Among  the  spring  type  and  pneumatic  cylinder 
type,  we  found  that  the  pneumatic  one  is  better  because  it  can  generate  powerful 
and  high  speed  launching  motion  with  lightweight  mechanism.  We  already  adopted 
pneumatic  hook  detachable  system  and  selection  of  the  pneumatic  system  for  the 
launcher  will  have  other  effect  to  make  the  total  system  simple.  One  of  the  most 
important  parts  of  the  launching  system  is  in  its  trigger  mechanism,  and  designed 
mechanism  is  shown  in  Fig. 8.  At  first,  a  piston  is  locked  by  a  ball  type  trigger,  and 
is  pressed  by  high-pressured  air  from  ’’Port  A”  as  shown  in  Fig. 8(a).  At  this  time, 
high-pressured  air  from  ’’Port  B”,  that  is  for  the  control  rod,  is  also  supplied.  Launch 
motion  can  be  done  by  decompress  the  air  for  ’’Port  B”  and  drive  the  control  rod  out 
of  the  balls  and  let  the  air  pressure  from  ’’Port  A”  drives  the  piston  go  right  direction 
and  launch  the  hook(Fig.8(b)).  Compared  with  the  trigger  mechanism  using  normal 
valve,  introduced  mechanism  can  makes  the  trigger  motion  smoothly  and  as  the 
pressurized  air  gives  pressure  to  the  hook  from  the  beginning,  it  can  increase  the 
initial  speed  of  the  hook  and  enable  it  to  cast  in  longer  distance. 

Next  we  explain  a  winch  for  the  TDH.  It  is  designed  to  have  three  modes;  drive 
mode,  free  mode  and  brake  mode. 

The  ’’drive  mode”  is  used  when  it  is  used  as  winch,  and  large  traction  force  should 
be  generated  to  support  a  robot.  The  ’’free  mode”  is  used  when  the  hook  is  going 
to  be  launched.  As  the  spool  have  to  rotate  in  high  speed,  the  actuator  to  produce 
large  traction  force  in  drive  mode  should  be  mechanically  disconnected.  The  ’’brake 
mode”  is  needed  for  two  reasons,  one  of  them  is  to  support  the  suspended  robot 
without  energy  loss  and  the  other  is  to  adjust  the  rotational  speed  of  the  winch 
when  it  is  in  ’’free  mode”  and  launching  the  hook.  As  the  hook  is  launched  by 


Fig.  7  Overview  of  a  launch¬ 
ing  winch  with  pneumatic 
TDH  model  III 


Fig.  8  Pneumatic  trigger  mechanism  of  the  launcher 
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pneumatic  pressure,  winch  in  free  mode  tends  to  keep  rotating  while  the  hook  is 
flying.  However  the  speed  of  the  hook  decelerates  while  flying  and  the  tether  tends 
to  excessively  goes  out  of  the  reel  and  entangle  around  the  reel.  This  phenomenon 
is  called  as  ’’backrush”  among  anglers  for  their  fishing  reel  motion.  To  prevent  the 
backrush  we  need  proper  braking  of  the  winch  in  the  free  mode. 

To  switch  these  3  modes,  we  adopted  a  multi-plate  clutch  mechanism  which  is 
installed  inside  the  spool  as  shown  in  Fig. 9.  Multiple  input  and  output  clutch  plates 
are  piled  up  to  increase  the  braking  torque.  Maximum  torque  of  this  clutch  mecha¬ 
nism  Tc  can  be  estimated  as  follows; 

Tc  =  NpjlFpre  (1) 

where,  Np  is  the  number  of  friction  surfaces  between  clutch  plates,  fl  is  the  coeffi¬ 
cient  of  static  friction,  Fp  is  a  pushing  force  for  clutch  plates  generated  by  the  motor 
thrusting  force  and  re  is  an  effective  radius  of  friction  surfaces. 

As  the  winch  rotate  infinitely  and  pressurize  air  have  to  be  supplied  to  the  air  tube 
to  connect  the  hook,  a  rotary  pneumatic  joint  is  introduced.  Air  is  supplied  from  a 
joint  in  right  section  of  the  figure,  and  it  pass  through  holes  on  a  hollow  shaft.  Two 
movable  O-rings  are  installed  to  prevent  the  leak  of  the  air. 


Clutch-control 

motor 
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Wave  generator 
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Fig.  9  Mechanism  of  the  winch 
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4  Experiment 


The  authors  confirmed  motion  performance  of  tethered  detachable  hook  of  Model 
III.  Basic  motion  to  hook  the  object  was  examined  as  shown  in  Fig.  10.  As  is  dis¬ 
cussed  before,  the  claw  was  automatically  lowered  and  gripped  the  branch  when  the 
hook  was  pulled  and  located  above  the  object.  From  this  experiment,  we  confirmed 
the  validity  of  introducing  the  one  claw  configuration  for  the  TDH. 

Next,  the  authors  made  a  simple  experiment  of  casting  the  Model  III  TDH  to  the 
real  branch  of  the  tree  as  shown  in  Fig.  1 1 .  From  this  experiment,  the  anchoring  func¬ 
tion  of  the  hook  and  its  detaching  motion  was  successfully  demonstrated.  Besides, 
smooth  collection  was  achieved  because  of  its  straight  shape  after  the  detaching 
motion. 

With  the  casting  device  mentioned  above,  the  authors  also  made  the  experiments 
to  verify  the  effectiveness  of  the  prevention  of  ’’backrush”  by  the  braking.  Fig.  12(a) 
shows  the  comparison  of  the  tether  on  the  after  launching  the  hook.  Left  is  the  result 
without  braking  and  right  is  the  result  with  proper  braking.  From  the  comparison 
of  these  results,  we  know  the  importance  of  the  braking  of  the  winch  in  free  mode. 
Fig.  12(b)  shows  measurement  results  of  outer  circumferential  velocity  of  the  winch. 
They  were  measured  by  an  encoder  connected  to  the  winch.  From  this  figure  too,  we 
can  know  that  proper  braking  enable  to  increase  the  launching  speed.  In  Fig.  12(b) 


Fig.  10  Sequential  motion  to 
show  the  self  adjustment  of 
the  claw  direction  to  the  target 
branch 


Fig.  11  Real  launching  ex¬ 
periment  of  the  TDH  model 
III  to  the  branch  of  a  tree 


(a)  Comparison  of  the  tether  with  and  without  braking  control 
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(b)  Measurement  results  of  outer  circumferential  velocity  of  the  winch 


Fig.  12  Comparison  experiment  with  and  without  braking  control 
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we  can  observe  the  speed  change  of  the  reel  at  the  time  near  the  1 .2  [sec] .  It  is  caused 
by  the  falls  of  the  hook  on  the  ground. 


5  Conclusions  and  Future  Works 

This  paper  introduces  a  new  concept  of  ’’tethered  detachable  hook  (TDH)”  and  its 
launching  winch  for  use  as  the  locomotion  assisting  device  for  mobile  robots.  This 
paper  firstly  discusses  about  several  prototype  models  of  the  TDH  and  elaborate  lat¬ 
est  model,  such  as  pneumatic  lock  and  release  mechanism  of  the  hook,  pneumatic 
launcher  and  the  reel  mechanism  which  exhibits  three  motion  states;  active  rota¬ 
tion,  free  rotation,  and  braking  of  the  rotation.  Performance  of  developed  TDH  and 
its  launching  winch  are  successfully  demonstrated  by  the  constructed  mechanical 
model.  Study  of  the  proposing"  tethered  detachable  hook  (TDH)  "  is  still  at  the 
starting  point,  and  there  remained  many  interesting  research  subjects  to  be  studied 
on  the  hook,  launcher,  and  winch  mechanisms  and  their  control.  We  are  hoping  to 
study  further  on  these  points  and  realize  the  mobile  robots  having  TDH  and  move 
around  mountainous  area  or  disaster  site  by  successively  casting  the  tethers  around 
the  environment  just  like  Spiderman  does  among  buildings  in  near  future. 
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Abstract  This  paper  describes  the  design  of  an  optimal-control-based  active 
safety  framework  that  performs  trajectory  planning,  threat  assessment,  and  semi- 
autonomous  control  of  passenger  vehicles  in  hazard  avoidance  scenarios.  The  ve¬ 
hicle  navigation  problem  is  formulated  as  a  constrained  optimal  control  problem 
with  constraints  bounding  a  navigable  region  of  the  road  surface.  A  model  predic¬ 
tive  controller  iteratively  plans  an  optimal  vehicle  trajectory  through  the  con¬ 
strained  corridor.  Metrics  from  this  “best-case”  scenario  establish  the  minimum 
threat  posed  to  the  vehicle  given  its  current  state.  Based  on  this  threat  assessment, 
the  level  of  controller  intervention  required  to  prevent  departure  from  the  naviga¬ 
ble  corridor  is  calculated  and  driver/controller  inputs  are  scaled  accordingly.  This 
approach  minimizes  controller  intervention  while  ensuring  that  the  vehicle  does 
not  depart  from  a  navigable  corridor  of  travel.  It  also  allows  for  multiple  actuation 
modes,  diverse  trajectory-planning  objectives,  and  varying  levels  of  autonomy. 
Experimental  results  are  presented  here  to  demonstrate  the  framework’s  semi- 
autonomous  performance  in  hazard  avoidance  scenarios. 


Introduction 


Recent  traffic  safety  reports  from  the  National  Highway  Traffic  and  Safety  Ad¬ 
ministration  show  that  in  2007  alone,  over  41,000  people  were  killed  and  another 
2.5  million  injured  in  motor  vehicle  accidents  in  the  United  States  [1].  The  long- 
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standing  presence  of  passive  safety  systems  in  motor  vehicles,  combined  with  the 
ever-increasing  influence  of  active  systems,  has  contributed  to  a  decline  in  these 
numbers  from  previous  years.  Still,  the  need  for  improved  collision  avoidance 
technologies  remains  significant. 

Recent  developments  in  onboard  sensing,  lane  detection,  obstacle  recognition, 
and  drive-by-wire  capabilities  have  facilitated  active  safety  systems  that  share 
steering  and/or  braking  control  with  the  driver  [2,3].  Among  existing  proposals  for 
semi-autonomous  vehicle  navigation,  lane-keeping  systems  using  audible  warn¬ 
ings  [4],  haptic  alerts  [5],  steering  torque  overlays  [6],  and  various  combinations 
of  these  have  been  developed  [7]. 

Many  of  the  navigation  systems  developed  in  previous  work  address  only  one 
piece  of  the  active  safety  problem.  While  some  use  planning  algorithms  such  as 
rapidly-exploring  random  trees  [3],  evolutionary  programming  [8]  or  potential 
fields  analysis  [9]  to  plan  a  safe  vehicle  path,  others  simply  begin  with  this  path 
presumed  [10].  The  threat  posed  by  a  particular  path  is  seldom  assessed  by  the 
controller  itself  and  is  often  only  estimated  by  a  simple  threat  metric  such  as  lat¬ 
eral  vehicle  acceleration  required  to  track  the  path  [11].  Finally,  hazard  avoidance 
is  commonly  performed  using  one  or  more  actuation  methods  without  explicitly 
accounting  for  the  effect  of  driver  inputs  on  the  vehicle  trajectory.  Such  control¬ 
lers  selectively  replace  (rather  than  assist)  the  driver  in  performing  the  driving 
task.  Yu  addressed  this  problem  in  mobility  aids  for  the  elderly  by  designing  an 
adaptive  shared  controller  which  allocates  control  authority  between  the  human 
user  and  a  controller  in  proportion  to  the  user’s  current  and  past  performance  [12]. 
While  sufficient  to  control  low-speed  mobility  aids,  this  reactive  approach  to 
semi-autonomy  is  not  well  suited  for  higher-speed  applications  with  significant  in¬ 
ertia  effects  and  no  pre-planned  trajectory. 

In  this  paper,  a  framework  for  passenger  vehicle  active  safety  is  developed  that 
performs  vehicle  trajectory  planning,  threat  assessment,  and  hazard  avoidance  in  a 
unified  manner.  This  framework  leverages  the  predictive  and  constraint-handling 
capabilities  of  Model  Predictive  Control  (MPC)  to  plan  trajectories  through  a  pre¬ 
selected  corridor,  assess  the  threat  this  path  poses  to  the  vehicle,  and  regulate 
driver  and  controller  inputs  to  maintain  that  threat  below  a  given  threshold.  The 
next  section  describes  the  semi-autonomous  control  framework  and  its  associated 
trajectory  prediction,  control  law,  threat  assessment,  and  intervention  law.  Ex¬ 
perimental  setup  and  results  are  then  presented,  and  the  paper  closes  with  general 
conclusions  and  recommendations. 


Framework  Description 


The  navigation  framework  operates  as  follows.  First,  an  objective  function  is  es¬ 
tablished  to  capture  desirable  performance  characteristics  of  a  safe/“optimal”  ve¬ 
hicle  path.  Boundaries  tracing  the  edges  of  the  drivable  road  surface  are  assumed 
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to  have  been  derived  from  forward-looking  sensor  data  and  a  higher-level  corridor 
planner.  These  boundaries  establish  constraints  on  the  vehicle’s  projected  position 
and  are  used  together  with  a  model  of  the  vehicle  dynamics  to  calculate  an  optimal 
sequence  of  inputs  and  the  associated  vehicle  trajectory.  The  predicted  trajectory 
is  assumed  to  be  a  “best-case”  scenario  that  poses  the  minimum  threat  to  the  vehi¬ 
cle  given  its  current  state.  This  threat  is  then  used  to  calculate  the  intervention  re¬ 
quired  to  prevent  departure  from  the  navigable  corridor  and  driver/controller  in¬ 
puts  are  scaled  accordingly.  Fig.  1  shows  a  block  diagram  of  this  system. 


Map  of  regions  and 
surface  properties 


Terrain  and  obstacle  % 

Identification  7  /  / 


Lookahead 
sensing 

Terrain  Interactions, 
environmental 
disturbances 


Fig.  1.  Diagram  of  an  active  safety  system. 

In  this  paper  it  is  assumed  that  road  lane  data  is  available  and  that  road  hazards 
have  been  detected,  located,  and  mapped  to  form  the  boundaries  of  a  2- 
dimensional  corridor  of  travel.  Radar,  LIDAR,  and  vision-based  lane-recognition 
systems  [3,13],  along  with  various  sensor  fusion  approaches  [14]  have  been  pro¬ 
posed  to  provide  the  lane,  position,  and  environmental  information  needed  by  this 
framework.  Additionally,  where  multiple  corridor  options  exist,  it  is  assumed  that 
a  high-level  path  planner  has  selected  a  single  corridor  through  which  the  vehicle 
should  travel. 


Vehicle  Path  Planning 

The  best-case  (or  baseline)  path  through  a  given  region  of  the  state  space  is  estab¬ 
lished  by  a  model  predictive  controller.  Model  Predictive  Control  is  a  finite- 
horizon  optimal  control  scheme  that  iteratively  minimizes  a  performance  objective 
defined  for  a  forward-simulated  plant  model  subject  to  performance  and  input 
constraints.  At  each  time  step,  t,  the  current  plant  state  is  sampled  and  a  cost¬ 
minimizing  control  sequence  spanning  from  time  t  to  the  end  of  a  control  horizon 
of  n  sampling  intervals,  t+nAt ,  is  computed  subject  to  inequality  constraints.  The 
first  element  in  this  input  sequence  is  implemented  at  the  current  time  and  the 
process  is  repeated  at  subsequent  time  steps.  The  basic  MPC  problem  setup  is  de¬ 
scribed  in  [15]. 

The  vehicle  model  used  in  this  paper  considers  the  kinematics  of  a  4-wheeled 
vehicle,  along  with  its  lateral  and  yaw  dynamics.  Vehicle  states  include  the  posi- 


4 


tion  of  its  center  of  gravity  [x,  y],  its  yaw  angle  y/ ,  yaw  rate  y/,  and  sideslip  angle 
/?,  as  illustrated  in  Fig.  2.  Table  1  defines  and  quantifies  this  model’s  parameters. 


Fig.  2.  Vehicle  model  used  in  MPC  controller. 


Table  1. Vehicle  model  parameters. 


Symbol 

Description 

Value  [units] 

m 

Total  vehicle  mass 

2050  [kg] 

Izz 

Yaw  moment  of  inertia 

3344  [kg-m2] 

Xf 

C.g.  distance  to  front  wheels 

1.43  [m] 

Xr 

C.g.  distance  to  rear  wheels 

1.47  [m] 

yw 

Track  width 

1.44  [m] 

Cf 

Front  cornering  stiffness 

1433  [N/deg] 

cr 

Rear  cornering  stiffness 

1433  [N/deg] 

V 

Surface  friction  coefficient 

1 

Tire  compliance  is  included  in  the  model  by  approximating  lateral  tire  force 
(Fy)  as  the  product  of  wheel  cornering  stiffness  (C)  and  wheel  sideslip  (< a  or  /?  for 
front  or  rear  wheels  respectively)  as  shown  in  (1). 


Fy=Ca  (1) 

Linearized  about  a  constant  speed  and  assuming  small  slip  angles,  the  equations  of 
motion  for  this  model  are  (where  S  represents  the  steering  angle  input) 


x  =  V 

y  =  V{v+p) 

j,  ~{Cr+Cf)  ((C^-C^) 


mV 


mV 1 


-1  I  y/+  —  S 
1  mV 


[crxr  - Cfxf )  (crx/  +  Cj-Xj2 )  .  Cfx 

I  P  IV  I 


¥  = 


(2) 

(3) 

(4) 

(5) 


where  Cf  and  Cr  represent  the  cornering  stiffness  of  the  lumped  front  wheels  and 
the  lumped  rear  wheels,  and  x/ and  x,.  are  the  longitudinal  distances  from  the  c.g.  of 
the  front  and  rear  wheels,  respectively. 
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Constraint  Setup  and  Objective  Function  Description 

As  mentioned  above,  this  framework  assumes  that  the  environment  has  been  de¬ 
lineated  previously.  The  boundaries  of  the  navigable  road  surface  at  each  timestep 
are  then  described  by  the  constraint  vectors 

y  max  (A^)  —  [y^max  {k  +  l)  *  *  *  y  max  ( Ji  +  /?)] 
y  Vmin  (k)  =  \yy  min  (k  +  l)  *  *  *  y ymin  ( k  +  pjf 

By  enforcing  vehicle  position  constraints  at  the  boundaries  of  the  navigable  re¬ 
gion  of  the  road  surface  (i.e.  the  lane  edges  on  an  unobstructed  road),  the  control¬ 
ler  forces  the  MPC-generated  path  to  remain  within  the  constraint-bounded  corri¬ 
dor  whenever  dynamically  feasible.  Coupling  this  lateral  position  constraint  with 
input  constraints  umin/max,  input  rate  constraints  Aumin/max,  and  vehicle  dynamic 
considerations,  the  navigable  operating  corridor  delineated  by  y7max  and  yymin 
translates  to  a  safe  operating  region  within  the  state  space. 

The  controller’s  projected  path  through  the  constraint- imposed  tube  is  shaped 
by  the  performance  objectives  established  in  the  MPC  cost  function.  While  many 
options  exist  for  characterizing  desirable  vehicle  trajectories,  here,  the  total  side¬ 
slip  angle  at  the  front  wheels  (a)  was  chosen  as  the  trajectory  characteristic  to  be 
minimized  in  the  objective  function.  This  choice  was  motivated  by  the  strong  in¬ 
fluence  front  wheel  sideslip  has  on  the  controllability  of  front-wheel- steered  vehi¬ 
cles  since  cornering  friction  begins  to  decrease  above  critical  slip  angles.  In  [16]  it 
is  shown  that  limiting  tire  slip  angle  to  avoid  this  strongly  nonlinear  (and  possibly 
unstable)  region  of  the  tire  force  curve  can  significantly  enhance  vehicle  stability 
and  performance.  Further,  the  linearized  tire  compliance  model  described  here 
does  not  account  for  this  decrease,  motivating  the  suppression  of  front  wheel  slip 
angles  to  reduce  controller-plant  model  mismatch. 

The  MPC  objective  function  with  weighting  matrices  7?(.)  then  takes  the  form 

k  +  p- 1  -i  k 

+  x  + 

i=k  2 

where  s  represents  constraint  violation  and  was  included  to  soften  select  position 
constraints  as  yjmin  -^min  <  yj  <  y /max  +  £\J max  • 


<7> 

i=k  2  z 


k  +  p 


=  X 

i=k+ 1  z 


Threat  Assessment 

The  vehicle  path  calculated  by  the  MPC  controller  is  assumed  to  be  the  best-case 
or  safest  path  through  the  environment.  As  such,  key  metrics  from  this  prediction 
are  used  to  assess  instantaneous  threat  posed  to  the  vehicle.  By  setting  constraint 
violation  weights  (p& )  significantly  higher  than  the  competing  minimization  weight 
( Ra )  on  front  wheel  sideslip,  optimal  solutions  satisfy  corridor  constraints  before 
minimizing  front  wheel  sideslip.  When  constraints  are  not  active,  front  wheel 
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sideslip  -  and  the  corresponding  controllability  threat  -  is  minimized.  When  the 
solution  is  constrained,  predicted  front  wheel  sideslip  increases  with  the  severity 
of  the  maneuver  required  to  remain  within  the  navigable  corridor. 

Various  approaches  are  available  to  reduce  the  predicted  front  wheel  sideslip 
vector  a  to  a  scalar  threat  metric  ®.  In  this  paper, 


®(*)  =  maxflor, 


a. 


k+p 


r 


(8) 


was  chosen  for  its  good  empirical  performance  when  used  to  regulate  controller 
intervention  (described  in  the  next  section). 


Hazard  Avoidance 


Given  a  best-case  vehicle  path  through  the  environment  and  a  corresponding 
threat,  desired  inputs  from  the  driver  and  controller  are  blended  and  applied  to  the 
vehicle.  This  blending  is  performed  based  on  the  threat  assessment:  a  low  pre¬ 
dicted  threat  causes  more  of  the  driver’s  input  and  less  of  the  controller’s  input  to 
be  applied  to  the  vehicle,  while  high  threat  allows  controller  input  to  dominate  that 
of  the  driver.  This  “scaled  intervention”  may  thereby  allow  for  a  smooth  transition 
in  control  authority  from  driver  to  controller  as  threat  increases. 

Denoting  the  current  driver  input  by  Udr  and  the  current  controller  input  by 
uMpc 5  the  blended  input  seen  by  the  vehicle,  uv ,  is  defined  as 

uv=K(<$>)uh,PC+(\-K(<S>j)udr  (9) 


The  intervention  function  K  is  used  to  translate  predicted  vehicle  threat  O  (ob¬ 
tained  from  the  MPC  trajectory  plan)  into  a  scalar  blending  gain.  This  function  is 
bounded  by  0  and  1  and  may  be  linear,  piecewise-linear,  or  nonlinear.  Linear  and 
piecewise-linear  forms  of  this  function  may  be  described  by 


K  =  f{0)  = 


0 


0<0<0 


eng 


O  -O 

_ fa* _  O  <o<0 

eng  aut 


(10) 


<J>  -O 

aut  eng 


o>o 


where  Oeng  and  ®aut  represent  the  threat  level  at  which  the  controller  engages  and 
the  level  at  which  it  is  given  full  control  authority  and  effectively  acts  as  an 
autonomous  controller. 

Using  predicted  threat  (O)  as  calculated  in  (8)  with  an  appropriate  cost  function 
formulation  of  the  form  (7)  ensures  that  1)  the  threat  metric  regulating  controller 
intervention  is  minimized  in  the  path  plan  (and  associated  control  calculation)  and 
2)  the  controller  maintains  full  control  authority  when  constraints  are  binding.  In¬ 
creasing  Oeng  widens  the  “low  threat”  band  in  which  the  driver’s  inputs  are  unaf¬ 
fected  by  the  controller.  Increasing  the  value  of  Oaut,  on  the  other  hand,  delays 
complete  controller  intervention  until  more  severe  maneuvers  are  predicted.  The 
friction-limited  bounds  on  the  linear  region  of  the  tire  force  curve  (1)  suggest  a 
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natural  upper  limit  of  ®aut<  5  degrees  in  order  to  ensure  that  by  the  time  the  pre¬ 
dicted  maneuver  required  to  remain  within  the  safe  region  of  the  state  space 
reaches  this  level  of  severity,  the  controller  has  full  control  authority  and  can  - 
unless  unforeseen  constraints  dictate  otherwise  -  guide  the  vehicle  to  safety. 


Experimental  Setup 

Experimental  testing  was  performed  at  14  m/s  using  a  test  vehicle  and  three  hu¬ 
man  drivers.  Driver  and  actuator  steering  inputs  were  coupled  via  an  Active  Front 
Steer  (AFS)  system.  An  inertial  and  GPS  navigation  system  was  used  to  measure 
vehicle  position,  sideslip,  yaw  angle,  and  yaw  rate  while  a  1  GHz  dSPACE  proc¬ 
essor  ran  controller  code  and  interfaced  with  steering  actuators. 

Three  common  scenarios  were  used  to  analyze  system  performance.  In  each 
scenario,  obstacles,  hazards,  and  driver  targets  were  represented  to  the  driver  by 
cones  and  lane  markings  and  to  the  controller  by  a  constrained  corridor  (with  on¬ 
board  sensing  and  constraint  mapping  assumed  to  have  been  performed  previously 
by  “virtual  sensors”  and  high-level  planners  respectively).  Only  results  from  mul¬ 
tiple-hazard-avoidance  tests  are  shown  below.  In  these  tests  (illustrated  in  Fig.  3), 
both  lanes  of  travel  were  blocked  at  different  locations,  forcing  the  vehicle  to 
change  lanes  to  avoid  the  first  hazard,  then  change  lanes  again  to  avoid  the  second. 


Fig.  3.  Multiple  hazard  avoidance  test  setup  showing  hazard  cone  placement  (circles)  and  lane  bounda¬ 
ries  (dashed). 

Two  types  of  human  driver  inputs  were  tested.  Drowsy,  inattentive,  or  other¬ 
wise  impaired  drivers  were  represented  by  a  constant  driver  steer  input  of  zero  de¬ 
grees.  In  these  tests,  the  unassisted  driver’s  path  formed  a  straight  line  directly 
through  the  obstacle(s).  To  represent  active  driver  steer  inputs,  the  drivers  were 
asked  to  steer  either  around  or  into  obstacles. 

Controller  parameters  are  described  and  quantified  in  Table  2. 

Table  2. Controller  parameters. 

Symbol  Description  Value  [units] 

P 

n 

R  (a) 


Prediction  horizon 
Control  horizon 
Weight  on  front  wheel  slip 


{35, 40} 
{18,20} 
0.2657 


Ru 

Weight  on  steering  input 

0.01 

Rau 

Weight  on  steering  input  rate  (A  per  At) 

0.01 

Wmin/max 

Steering  input  constraints 

±  10  [deg] 

AWmin/max 

steering  input  rate  (per  At)  constraints 

±  .75  [deg]  (15  deg/s) 

V 

y  min/max 

Lateral  position  constraints 

Scenario-dependent 

Ps 

Weight  on  constraint  violation 

1  x  105 

[<I>eng  <J>aut] 

Thresholds  for  controller  intervention 

{[0  3],  [1  3]}  deg 

V 

Variable  constraint  relaxation  on  vehicle  position 

[1.25,  1.25,0.01] 

Experimental  Results 


The  semi-autonomous  framework  proved  capable  of  keeping  the  vehicle  within 
the  navigable  corridor  for  each  of  the  maneuvers,  using  various  system/controller 
configurations,  and  with  three  different  human  drivers.  Results  from  multiple  haz¬ 
ard  avoidance  experiments  are  shown  below. 

Fig.  4  compares  a  semi-autonomous  multi-hazard-avoidance  maneuver  to  an 
autonomous  maneuver  (X=l). 


Fig.  4.  Multiple  hazard  avoidance  tests  showing  the  similarity  between  semi-autonomous  (dash-dot) 
and  autonomous  (solid)  vehicle  trajectories. 

Notice  that  the  semi-autonomous  controller  delayed  intervention  until  the 
driver’s  inputs  put  the  vehicle  at  risk  of  leaving  the  navigable  road  surface.  When 
the  framework  did  intervene,  it  allocated  enough  control  authority  to  the  controller 
to  avert  corridor  departure  or  loss  of  control.  Also  notice  that  even  with  average 
controller  intervention  Kaye=0A4,  the  vehicle  trajectory  obtained  using  the  semi- 
autonomous  controller  very  closely  resembles  the  “best  case”  trajectory  taken  by 
the  autonomous  controller.  This  results  from  the  selective  nature  of  the  semi- 
autonomous  system  -  it  intervenes  only  when  necessary,  then  relinquishes  control 
to  the  driver  once  threat  to  the  vehicle  has  been  reduced. 

Fig.  5  shows  experiments  in  which  the  driver  was  instructed  to  swerve  at  the 
last  minute  to  avoid  hazards. 
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Fig.  5.  Multiple  hazard  avoidance  tests  showing  the  vehicle  trajectory  with  an  unassisted  driver  input 
(dashed)  and  autonomous  controller  (solid),  and  semi-autonomous  controller  (dash-dot). 

Notice  that  intervention  by  the  semi-autonomous  controller  slightly  preceded 
an  otherwise-late  driver  reaction.  The  combined  effect  of  both  inputs  was  then  suf¬ 
ficient  to  avoid  both  road  hazards. 

Finally,  in  each  of  the  above  experimental  results,  this  shared- adaptive  control¬ 
ler  behaves  as  a  stable  closed-loop  system.  While  this  was  also  true  of  all  of  the 
other  simulated  and  experimental  results  conducted  to  date,  no  rigorous  stability 
proof  is  presented  in  this  paper. 


Conclusions 


This  paper  presented  an  optimal-control-based  framework  that  performs  trajectory 
planning,  threat  assessment,  and  semi-autonomous  control  of  passenger  vehicles 
in  hazard  avoidance.  This  framework  has  been  proven  experimentally  capable  of 
satisfying  position,  input,  and  dynamic  vehicle  constraints  using  multiple  threat 
metrics  and  intervention  laws.  Additionally,  this  framework  has  been  shown  to 
provide  significant  autonomy  to  a  human  driver,  intervening  only  as  necessary  to 
keep  the  vehicle  under  control  and  within  the  navigable  roadway  corridor.  Ex¬ 
perimental  results  have  also  shown  this  control  framework  to  be  stable  even  in  the 
presence  of  system-inherent  time  delays,  though  a  rigorous  stability  proof  is  a 
topic  of  current  investigation. 

Finally,  while  human  factors  have  not  been  studied  in  depth  here,  it  is  expected 
that  with  additional  investigation,  a  best-case,  or  average  driver-preferred  inter¬ 
vention  law  may  be  described  and  intervention  settings  tuned  accordingly.  Further 
work  is  needed  before  this  research  is  road-ready. 
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Abstract  As  mobile  robots  venture  into  more  difficult  environments,  more  com¬ 
plex  state- space  paths  are  required  to  move  safely  and  efficiently.  The  difference 
between  mission  success  and  failure  can  be  determined  by  a  mobile  robot’s  ca¬ 
pacity  to  effectively  navigate  such  paths  in  the  presence  of  disturbances.  This  pa¬ 
per  describes  a  technique  for  mobile  robot  model  predictive  control  that  utilizes 
the  structure  of  a  regional  motion  plan  to  effectively  search  the  local  continuum 
for  an  improved  solution.  The  contribution,  the  receding  horizon  model-predictive 
control  (RHMPC)  technique,  specifically  addresses  the  problem  of  path  following 
and  obstacle  avoidance  through  geometric  singularities  and  discontinuities  such  as 
cusps,  turn-in-place,  and  multi-point  turn  maneuvers  in  environments  where  ter¬ 
rain  shape  and  vehicle  mobility  effects  are  non-negligible.  The  technique  is  formu¬ 
lated  as  an  optimal  controller  that  utilizes  a  model-predictive  trajectory  generator 
to  relax  parameterized  control  inputs  initialized  from  a  regional  motion  planner  to 
navigate  safely  through  the  environment.  Experimental  results  are  presented  for  a 
six-wheeled  skid-steered  field  robot  in  natural  terrain. 


1  Introduction 


Mobile  robot  navigation  is  the  challenge  of  selecting  intelligent  actions  from  the 
continuum  of  possible  actions  that  make  progress  towards  achieving  some  goal  un¬ 
der  the  constraints  of  limited  perceptual  information,  computational  resources,  and 
planning  time  of  the  system.  It  also  often  viewed  as  the  problem  of  balancing  path 
following  and  obstacle  avoidance  in  autonomous  system  architectures.  Regional 
motion  planning  is  the  problem  of  planning  beyond  the  perceptive  (sensor)  hori¬ 
zon. 
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A  state-space  trajectory  is  typically  defined  as  a  vector  valued  function  of  mono¬ 
tonic  time  (t).  There  are,  however,  circumstances  where  time  is  replaced  by  poten¬ 
tially  nonmonotonic  functions  of  distance  (s)  or  heading  (\jf)  to  form  a  path.  Path 
representations  are  used  to  achieve  behaviors  that  allows  velocity  to  remain  unspec¬ 
ified.  A  “cusp”  is  a  point  in  a  trajectory  where  linear  velocity  changes  sign.  While 
cusps  are  discontinuous  in  path  curvature,  they  are  not  discontinuous  in  state  space 
trajectories  and  are  perfectly  feasible  motions.  Furthermore,  the  concept  of  “for¬ 
ward  on  a  path”  is  not  well-defined  for  cusps  (and  likewise  for  point  turns)  whereas 
“forwards  in  time”  always  has  meaning.  The  capacity  of  a  state  space  trajectory  rep¬ 
resentation  to  remove  discontinuities  and  permit  a  forward  horizon  to  be  defined  are 
the  basis  of  our  preference  for  this  representation. 

A  reference  trajectory  is  the  state-space  trajectory  (x(f)1)  provided  by  a  regional 
motion  planner  (or  other  form  of  global  guidance).  The  reference  inputs  (u(x,r)) 
are  the  controls  which  cause  the  vehicle  to  follow  the  path  perfectly  in  the  absence 
of  disturbances.  In  the  presence  of  disturbances,  the  reference  control  signals  that 
correspond  to  a  disturbance  free  trajectory  must  be  augmented  by  corrective  controls 
to  null  the  following  error  over  some  time  horizon. 


1.1  Motivation 

As  mobile  robots  navigate  intricate  motion  plans  composed  of  cusps,  turn-in-place, 
and  multi-turn  maneuvers,  the  geometric  singularities  and  discontinuities  of  these 
inflection  points  become  problematic.  Commonly  applied  techniques  cannot  gener¬ 
ally  reason  about  actions  beyond  these  problematic  points,  which  can  endanger  the 
system  or  impede  path  following  performance  by  limiting  the  navigation  horizon. 

Consider  the  situation  shown  in  Figure  1 .  In  this  example,  the  mobile  robot  de¬ 
viates  from  the  reference  trajectory  from  disturbances  including  errors  in  modeling 
dynamics,  terramechanical  properties,  and  mobility. 


Fig.  1  A  vehicle  attempts  to  to  follow  a  reference  trajectory  with  geometric  singularities. 


1  The  state  (x)  contains  the  vehicle  position,  orientation,  velocity,  or  any  other  quantity  of  interest 
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The  popular  class  of  “pursuit”  algorithms  [1]  will  round  path  comers,  avoid 
cusps,  and  fail  for  turn-in-place  maneuvers  where  the  pursuit  point  becomes  unde¬ 
fined.  In  contexts  where  such  intricate  maneuvers  were  generated  by  a  path  planner 
in  order  to  avoid  obstacles,  a  pursuit  planner  is  inadequate.  Sampling-based  obstacle 
avoidance  techniques  [6]  sometimes  fail  for  intricate  path  navigation  because  of  the 
computational  resources  required  to  search  the  entire  input  or  state  space  densely 
enough  to  find  an  acceptable  solution. 

For  effective  intricate  path  navigation,  a  technique  is  needed  which  can  exploit 
the  reference  trajectory  structure  to  search  in  the  local  continuum  for  actions  which 
minimize  path  deviation  and  avoid  obstacles.  This  is  the  process  of  parametric  re¬ 
laxation,  the  technique  of  rendering  a  functional  on  a  few  parameters  in  order  to 
permit  relaxation  of  a  trajectory  (for  optimization  purposes)  by  searching  a  small 
number  of  degrees  of  freedom. 


1.2  Related  Work 

There  has  been  substantial  research  in  the  problem  of  developing  effective,  efficient 
mobile  robot  navigators.  Early  path  following  controllers  operate  on  the  assumption 
of  tracking  a  single  lookahead  point  and  have  been  greatly  extended  in  the  literature 
[4].  In  [12],  effective  search  spaces  for  navigation  in  roads  and  trails  were  produced 
by  generating  nudges  and  swerves  to  the  motion  that  reacquires  the  center  of  the 
lane. 

An  alternative  approach  involves  sampling  in  the  input  space  of  the  vehicle.  In 
[6],  navigation  search  spaces  were  generated  by  sampling  in  the  input  space  of  cur¬ 
vature.  This  approach  also  estimated  the  response  of  each  action  through  a  predictive 
motion  model  subject  to  the  initial  state  constraints  to  more  accurately  predict  the 
consequences  of  the  actions.  Egographs  [8]  represent  a  technique  for  generating  ex¬ 
pressive  navigation  search  spaces  offline  by  precomputing  layered  trajectories  for  a 
discrete  set  of  initial  states.  Precomputed  arcs  and  point  turns  comprised  the  control 
primitive  sets  used  to  autonomously  guide  planetary  rovers  for  geologic  exploration 
[2]  where  convolution  on  a  cost  or  goodness  map  determined  the  selected  trajec¬ 
tory.  This  approach  was  an  extension  of  Morphin  [11],  an  arc-planner  variant  where 
terrain  shape  was  considered  in  the  trajectory  selection  process.  Another  closely 
related  algorithm  is  the  one  presented  in  [3],  where  an  arc-based  search  space  is 
evaluated  based  on  considering  risk  and  interest. 

Other  techniques  such  as  rapidly-exploring  random  trees  [7]  have  been  effec¬ 
tively  used  to  generate  search  spaces  around  the  mobile  robot  to  navigate  cluttered, 
difficult  environments  and  generate  sophisticated  maneuvers  including  u-turns.  [9] 
presents  a  reactive  path  following  controller  for  a  unicycle  type  mobile  robot  built 
with  a  Deformable  Virtual  Zone  to  navigate  paths  without  the  need  for  global  path 
replanning. 
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1.3  Discriminators 

The  main  contribution  of  this  work  is  the  development  of  a  receding  horizon  model- 
predictive  controller  (RHMPC),  a  trajectory  follower  that  effectively  navigates  intri¬ 
cate  paths  in  complex  environments.  The  algorithm  leverages  the  capacity  to  gener¬ 
ate  the  reference  controls  for  a  given  reference  trajectory.  This  capability  exists  be¬ 
cause  the  sequence  of  reference  controls  can  be  generated  by  a  trajectory  generator 
that  understands  the  association  between  inputs  and  the  corresponding  state- space 
trajectory.  Our  particular  preference  is  parameterized  controls,  but  the  key  issue  is 
that  the  controls  are  known,  however  represented,  which  correspond  exactly  to  the 
reference  trajectory.  The  corresponding  reference  trajectory  inputs  are  already  avail¬ 
able  in  many  regional  motion  planner  implementations,  so  this  simply  requires  that 
this  additional  information  is  passed  to  the  navigator  with  the  reference  trajectory. 
Field  experiment  results  demonstrate  that  the  proposed  technique  can  effectively 
navigate  intricate  paths. 


2  Technical  Approach 

This  section  describes  the  issues  related  to  navigation  of  intricate  paths  generated  by 
regional  motion  planners,  the  methods  by  which  parameterized  controls  are  gener¬ 
ated,  and  the  trajectory  optimization  techniques  used  to  generate  corrective  actions. 
The  trajectory  follower  is  formulated  as  an  optimal  control  problem: 

minimize:  J(x,u,f) 
subject  to:  x  =  f(x,  u,  t) 

*0/)=xi 

u(x,f)  G  U(x,f),  t  G  I ti,tp\ 

The  problem  is  one  of  determining  actions  from  a  set  of  functions  (U (x,t))  to 
represent  the  control  inputs  (u (x,t))  which,  when  subject  to  the  predictive  motion 
model  (f(x,  u ,t)),  minimize  a  penalty  function  (J(x,  u,  t)).  An  additional  requirement 
for  the  trajectory  follower  is  that  the  resulting  control  must  defined  for  a  specific 
period  of  time  or  distance.  This  allows  the  optimized  path  to  be  evaluated  for  hazards 
to  ensure  vehicle  safety. 


2.1  Control  Parameterization 


One  of  the  most  difficult  problems  in  motion  planning  involves  reducing  the  contin¬ 
uum  of  actions  to  a  manageable  space  to  search.  The  trajectory  following  technique 
that  we  present  uses  a  portion  of  the  reference  controls,  which  may  be  only  piece- 
wise  continuous,  as  the  initial  guess.  First,  the  reference  trajectory  is  divided  into 
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the  primitives  used  by  the  motion  planner  as  shown  in  Figure  2(a).  For  each  action, 
there  exists  a  set  of  controls  that,  when  applied  to  the  system,  produce  a  path  seg¬ 
ment  of  a  certain  shape.  Parameterized  freedom  vectors  (pO  control  the  shape  of 
each  set  of  inputs  (u(pi,x,f))  that  define  the  reference  trajectory. 


Fig.  2  A  portion  of  the  reference  trajectory  determines  the  initial  inputs  for  the  RHMPC  technique. 

The  initial  guess  for  the  parameterized  control  inputs  (urhmpc  (Prhmpc  >x:0) 
is  defined  by  the  sequence  of  trajectory  segments  between  the  nearest  state  and  the 
predefined  fixed  control  horizon  (Figure  2(b)).  In  this  example,  the  free  parame¬ 
ters  of  the  receding  horizon  model-predictive  controller  (Prhmpc)  are  defined  by  a 
concatenation  of  free  parameters  in  the  control  inputs: 

Prhmpc  =  [pi  P2  P3 ] T  (2) 


2.2  Path  Deviation  Optimal  Control 

Once  the  control  input  parameterization  is  determined,  the  next  step  is  to  modify  the 
parameters  to  compensate  for  disturbances,  approximations,  and  errors  in  the  motion 
model.  This  technique  seeks  to  minimize  a  cost  function  (J(x,u,r))  by  modifying  a 
set  of  control  inputs: 


rtF 

J(x,u  ,t)  =  <P(x(tI),tI,x(tF),tF)  +  Jf(x(t),u(p,x,t),t)dt  (3) 

Jti 

The  initial  corrective  action  is  evaluated  through  the  predictive  motion  model 
subject  to  the  initial  state  constraints  to  obtain  an  estimate  of  the  cost  function  (Fig¬ 
ure  3).  While  the  gradient  of  the  cost  function  with  respect  to  the  parameterized 
control  input  freedom  exceeds  a  threshold,  the  algorithm  adjusts  the  control  inputs 
to  minimize  the  integrated  penalty  function  (j£?(x,u ,t)).  The  parameterized  free¬ 
doms  are  modified  iteratively  through  any  standard  optimization  technique,  such  as 
gradient  descent,  as  the  cost  function  gradient  is  determined  entirely  numerically: 
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Prhmpc,  =  PRHMPCi-i  -  avj(x,u,r),  i>  1  (4) 


(a)  (b) 


Fig.  3  A  correction  action  is  determined  by  optimizing  the  portion  of  the  reference  trajectory  from 
the  initial  state  of  the  vehicle.  After  n  optimization  steps,  the  inputs  described  by  the  modified 
receding  horizon  model-predictive  control  parameters  reacquire  the  reference  trajectory. 


2.3  Integrating  Observed  Cost  Information 


There  are  several  situations  when  a  mobile  robot  should  intentionally  deviate  from 
the  reference  trajectory.  Navigating  around  recently  observed  static  and  dynamic 
obstacles  faster  than  the  replanning  rate  of  the  regional  motion  planner  is  important 
when  perceptual  information  is  frequently  updated.  Another  reason  for  deviation 
is  the  suboptimality  of  the  reference  trajectory  itself.  One  solution  is  to  stop  and 
request  a  refined  or  alternative  plan.  An  potentially  better  method  is  tp  include  cost 
information  into  the  utility  functional  optimized  by  the  receding  horizon  model- 
predictive  controller  to  determine  the  obstacle  avoidance  maneuver. 

The  presented  technique  is  naturally  suited  to  deform  the  current  action  for  local 
obstacle  avoidance  and  path  smoothing.  The  desired  behaviors  can  be  integrated  by 
modifying  the  cost  function  to  include  a  weighted  penalty  for  obstacle  cost: 

_S?(x,u,r)  =  wi3edeviation{x,u,t)  +  w2^cosM,'a,t)  (5) 

An  illustration  of  this  process  is  shown  in  Figure  4.  In  this  example,  new  per¬ 
ception  information  places  obstacles  along  the  reference  trajectory  (Figure  4(a)). 
Forward  simulation  of  the  corrective  action  shows  that  the  vehicle  will  inevitably 
collide  with  these  obstacles.  Rather  than  compute  an  entirely  new  regional  motion 
plan,  an  obstacle  avoidance  maneuver  is  generated  by  relaxing  the  path  subject  to 
this  new  weighted  cost  function  (Figure  4(b)). 
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(b) 

Xl  +  jlf  f(URHMPc(PRHMPC«)X)0)^ 

Fig.  4  Obstacle  avoidance  behaviors  are  integrated  into  the  trajectory  follower  simply  by  modify¬ 
ing  the  utility  function  minimized  in  the  trajectory  relaxation. 


recently  observed  obstacle  t  v 

xi  +  Jtf  f (urhmpc  (Prhmpco >  xj 


3  Implementation 


The  regional  motion  planer  used  to  generate  feasible  reference  trajectories  for  these 
experiments  running  A*  on  a  graph  composed  of  regularly  arranged  discrete  nodes 
in  a  state  space,  similar  to  [10].  The  connectivity  between  nodes  in  the  discretized 
graph  was  provided  by  a  motion  template  consisting  of  forward,  reverse,  and  turn- 
in-place  trajectories  with  lengths  varying  between  3m  and  9m.  This  particular  imple¬ 
mentation  operated  on  a  60m  x  60m  vehicle-centered  cost  map.  Reference  trajectory 
updates  were  provided  by  the  regional  motion  planner  at  a  rate  of  2  Hz. 

The  resulting  reference  trajectory  is  a  series  of  sequential  independent  parame¬ 
terized  trajectories.  Intricate  paths  composed  of  multiple  cusps  and/or  turn-in-place 
actions  often  result  from  the  diversity  and  expressiveness  of  edges  in  the  motion 
planning  graph  and  the  complexity  of  the  environment. 

The  model-predictive  trajectory  generator  [5]  was  used  in  both  the  motion  tem¬ 
plate  generation  and  the  path  deviation  optimal  control.  Actions  in  the  motion  tem¬ 
plate  were  composed  of  constant  linear  velocities  and  either  second-order  spline 
curvature  functions  parameterized  by  distance  or  constant  angular  velocity  functions 
parameterized  by  heading.  Generic  spline  classes  defined  by  sequences  of  individual 
command  profiles  parameters  optimized  by  the  receding  horizon  model-predictive 
controller.  The  corrective  actions  were  generated  by  the  receding  horizon  model- 
predictive  controller  at  a  rate  of  20  Hz. 


4  Experiments 

A  set  of  experiments  were  designed  as  a  comparison  between  a  navigator  that  used 
the  presented  trajectory  follower  and  one  that  directly  executed  the  regional  motion 
plan.  Both  systems  used  the  same  version  of  a  lattice  planner  that  searches  dynam¬ 
ically  feasible  actions  which  was  specifically  designed  for  the  test  platform.  Each 
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field  experiment  was  required  to  achieve  a  series  of  waypoints  in  an  environment 
with  updating  perceptual  information  generated  by  an  on-board  perception  system 
combined  with  limited  overhead  prior  data. 

The  platform  for  the  field  experiments  was  Crusher  (Figure  5(a)),  a  six- wheeled 
skid  steered  outdoor  mobile  robot.  The  multi-kilometer  experiments  were  conducted 
at  a  test  site  in  Pittsburgh,  Pennsylvania  with  variable  off-road  terrain  (Figure  5(b)). 


(a)  Crusher 


(b)  The  field  experiment  courses 


Fig.  5  The  mobile  robot  and  test  environment  for  the  RHMPC  field  experiments. 


Integrated  path  cost  was  the  main  metric  used  to  measure  success  for  the  field 
experiments,  which  is  related  to  the  risk,  mobility,  and  traversability  for  the  vehicle’s 
configuration  in  the  environment.  While  inherently  unitless  and  scaleless,  it  provides 
the  best  metric  for  measuring  performance  because  both  the  motion  planner  and  the 
trajectory  follower  optimize  this  quantity. 


5  Results 


This  section  presents  the  results  of  the  three  field  experiments  comparing  the  per¬ 
formance  of  the  presented  trajectory  follower  to  a  system  directly  executing  the 
regional  motion  plan.  Figures  6(a)-(c)  show  the  integrated  cost  of  each  systems  be¬ 
tween  each  waypoint.  It  is  useful  to  look  at  each  waypoint- waypoint  segment  sepa¬ 
rately  because  each  one  can  be  considered  to  be  an  independent  trial. 

On  average,  the  system  using  the  trajectory  follower  slightly  outperformed  or 
achieved  a  similar  level  of  performance  of  the  alternative  system.  For  portions  of 
the  course  where  disturbances  relative  to  the  predicted  motion  are  uncommon  or  the 
local  cost  gradient  was  small  near  the  disturbances  very  little  improvement  would 
be  expected,  with  more  improvement  expected  in  cases  where  small  system  dis¬ 
turbances  can  quickly  lead  to  significantly  different  path  cost.  Figure  6(d)  shows 
the  total  integrated  cost  for  each  of  the  three  field  experiments.  It  is  important  to 
note  that  while  the  trajectory  follower  did  not  outperform  the  alternative  system  be¬ 
tween  every  waypoint,  it  did  improve  the  overall  performance  of  the  system  by  up 
to  7.2%.  The  variability  in  the  results  is  expected  because  of  the  chaotic  nature  of 
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outdoor  mobile  robots  were  any  number  of  small  changes  can  cause  the  robot  to 
select  a  significantly  different  path. 


Waypoint  Path  Cost  vs.  Waypoint 


(a)  Course  1  waypoint  path  cost 


Waypoint  Path  Cost  vs.  Waypoint 


(c)  Course  3  waypoint  path  cost 


Waypoint  Path  Cost  vs.  Waypoint 


(b)  Course  2  waypoint  path  cost 


Total  Path  Cost  vs.  Course 


(d)  Total  path  cost 


Fig.  6  The  waypoint  and  total  path  cost  for  a  series  of  comparison  runs  on  three  multi-kilometer 
courses.  The  RHMPC  technique  reduced  the  accumulated  cost  experienced  by  the  robot. 


6  Conclusions 

The  receding  horizon  model-predictive  control  algorithm  enables  mobile  robots  to 
navigate  intricate  paths  by  utilizing  the  paths  by  relaxing  parameterized  controls  that 
correspond  exactly  to  the  path  shape.  This  technique  enables  planning  through  in¬ 
flection  points  and  turn-in-place  actions  in  paths  to  better  reason  about  the  recovery 
trajectory.  This  method  makes  it  possible  to  intelligently  search  the  local  contin¬ 
uum  for  an  action  which  minimizes  path  following  error  and/or  avoids  obstacles. 
It  also  enables  several  other  important  behaviors  including  the  capacity  to  define  a 
utility  function  in  situations  where  pursuit  planners  fail  and  the  ability  to  correctly 
follow  path  discontinuities  like  cusps  which  are  otherwise  feasible  motions.  Several 
multi-kilometer  field  experiments  demonstrated  that  the  inclusion  of  the  presented 
trajectory  follower  as  a  mobile  robot  navigator  improves  upon  the  metric  that  the 
regional  motion  planner  minimizes. 
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Posterior  Probability  Estimation  Techniques 
Embedded  in  a  Bayes  Filter  for  Vibration-based 
Terrain  Classification 

Philippe  Komma  and  Andreas  Zell 


Abstract  Vibration  signals  acquired  during  robot  traversal  provide  enough  informa¬ 
tion  to  yield  a  reliable  prediction  of  the  current  terrain  type.  In  a  recent  approach, 
we  combined  a  history  of  terrain  class  estimates  into  a  final  prediction.  We  there¬ 
fore  adopted  a  Bayes  filter  taking  the  posterior  probability  of  each  prediction  into 
account.  Posterior  probability  estimates,  however,  were  derived  from  support  vector 
machines  only,  disregarding  the  capability  of  other  classification  techniques  to  pro¬ 
vide  these  estimates.  This  paper  considers  other  classifiers  to  be  embedded  into  our 
Bayes  filter  terrain  prediction  scheme,  each  featuring  different  characteristics.  We 
show  that  the  best  classification  results  are  obtained  using  a  combined  k-nearest- 
neighbor  and  support  vector  machine  approach  which  has  not  been  considered  for 
terrain  classification  so  far.  Furthermore,  we  demonstrate  that  other  classification 
techniques  also  benefit  from  the  temporal  filtering  of  terrain  class  predictions. 


1  Introduction 

In  outdoor  applications  such  as  rescue  missions  or  agricultural  assignments  the  mo¬ 
bile  robot  navigates  over  varying  ground  surfaces,  each  possessing  different  charac¬ 
teristics.  To  ensure  a  safe  traversal  in  outdoor  environments  the  robot  should  adapt 
its  driving  style  according  to  the  presence  of  ground  surface  hazards  like  slippery  or 
bumpy  surfaces.  These  hazards  are  denoted  as  non-geometric  hazards  [18].  There¬ 
fore,  most  approaches  employ  a  model-based  prediction  scheme  which  estimates  the 
current  terrain  type  from  sensor  readings.  In  a  model  generation  phase,  the  model 
learns  the  correct  assignment  of  a  labeled  terrain  class  given  the  respective  obser¬ 
vation.  In  the  recall  phase,  that  is,  during  terrain  traversal  over  unknown  terrain, 
the  robot  then  uses  this  model  to  predict  the  current  ground  surface.  For  acquir- 
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ing  input  data,  a  variety  of  sensors  such  as  vision  [11,  1]  or  ladar  sensors  [15,  10] 
can  be  employed.  Recently,  several  researchers  considered  vehicle  vibrations  for 
terrain  classification  as  originally  proposed  in  [8].  In  this  context,  vibration  data  ac¬ 
quired  from  accelerometers  have  been  successfully  applied  to  planetary  rovers  [3], 
autonomous  ground  vehicles  [7],  and  experimental  unmanned  vehicles  [6].  In  [16] 
a  comparison  was  drawn  between  different  base  classifiers  providing  the  model  for 
vibration-based  terrain  classification.  These  techniques,  however,  estimate  the  ter¬ 
rain  type  using  single  sensor  measurements  only,  disregarding  the  temporal  coher¬ 
ence  between  consecutive  measurements.  We  addressed  this  problem  in  [9].  There, 
we  applied  a  Bayes  filter  to  combine  the  posterior  probabilities  of  several  recent  ter¬ 
rain  class  predictions  into  a  final  prediction.  In  our  approach,  posterior  probability 
estimation  was  performed  using  a  support  vector  machine  (SVM)  since  this  classi¬ 
fier  was  reported  to  yield  the  best  classification  results  in  a  single  observation-based 
prediction  scheme  [16]. 

To  motivate  our  current  research  we  first  note  that  the  performance  of  a  classifier  in 
the  context  of  Bayes-filtered  terrain  classification  does  not  depend  on  the  classifica¬ 
tion  quality  only  but  also  on  the  quality  of  the  prediction  certainty:  Since  the  final 
classification  is  based  on  the  posterior  probability  of  single  predictions,  it  benefits 
from  a  model  which  performs  confident  correct  predictions  and  uncertain  erroneous 
predictions.  Classifiers  which  provide  these  characteristics  result  in  a  better  predic¬ 
tion  performance  when  embedded  into  our  Bayes  filter  approach.  This  is  because 
erroneous  predictions  obtain  a  lower  weight  in  the  filtering  process  and  thus  influ¬ 
ence  the  final  prediction  less  significantly.  The  quality  of  various  classifiers  relating 
to  the  prediction  certainty  is  unclear  and  is  hence  investigated  in  this  paper.  Sec¬ 
ond,  the  SVM  classifier  is  not  an  appropriate  choice  in  all  domains,  especially  for 
online  learning  [17]  where  an  enduring  model  generation  phase  is  not  applicable. 
Thus,  this  paper  focuses  on  the  selection  of  an  adequate  classifier  with  regard  to 
its  limiting  factors  such  as  training  and  model  selection  time,  storage  requirements, 
and  the  run-time  complexity  of  the  recall  phase.  We  further  applied  the  SVM-KNN 
classifier  introduced  in  [21]  which  in  our  terrain  classification  task  was  significantly 
superior  to  all  other  classifiers  considered  so  far. 

The  remainder  of  this  paper  is  organized  as  follows:  Section  2  briefly  describes  our 
terrain  classification  model,  taking  both  one  and  several  recent  observations  into 
account.  The  posterior  probability  estimation  techniques  of  the  classifiers  to  be  em¬ 
bedded  in  our  temporally-filtered  classification  approach  are  introduced  in  Sect.  3. 
After  summarizing  our  experimental  setup  in  Sect.  4  we  present  and  discuss  exper¬ 
imental  results  in  Sect.  5.  Finally,  a  conclusion  is  given  in  the  last  section. 


2  Terrain  Classification  Model 


This  section  summarizes  our  terrain  classification  technique  based  on  both  single 
observations  and  temporal  filtering  of  several  recent  terrain  predictions.  A  detailed 
description  is  presented  in  [9]. 
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The  objective  of  our  approach  is  to  estimate  the  terrain  type  the  robot  is  currently 
traversing.  Predictions  are  model-based,  assigning  a  certain  terrain  class  from  a  set 
of  classes  to  recorded  observations.  We  represent  the  observations  by  acceleration 
data  sampled  at  a  frequency  of  100  Hz  over  a  period  of  1.28  s.  The  acceleration  data 
can  be  regarded  as  the  vibration  which  the  terrain  induces  to  the  body  of  the  robot. 
For  feature  extraction,  we  applied  the  Fast  Fourier  Transform  (FFT)  to  the  raw  input 
signal  to  determine  its  FFT  amplitude  spectrum  in  a  second  step.  We  then  normal¬ 
ized  the  data  by  scaling  each  component  of  the  preprocessed  vibration  signal  to  have 
a  mean  of  0  and  a  standard  deviation  of  1 .  The  scaled  amplitude  spectrum  entries 
constitute  the  inputs  for  the  terrain  classification  model. 

In  the  recall  phase,  the  robot  predicts  the  current  terrain  type,  using  the  terrain  clas¬ 
sification  model  generated  during  training.  Therefore,  the  same  preprocessing  has  to 
be  applied  to  the  acquired  vibration  data.  Using  the  posterior  probability  estimation 
techniques  presented  in  the  next  section,  the  application  of  the  final  feature  vector  to 
the  classifier  does  not  only  provide  a  class  prediction  but  also  an  approximation  of 
the  posterior  p(x  =  i\u).  This  probability  distribution  denotes  the  probability  that  a 
preprocessed  vibration  segment  u  belongs  to  terrain  class  i.  Next,  we  describe  how 
p(x\u)  can  be  embedded  into  a  Bayes  filter  framework. 

Using  a  Bayes  filter  [14],  the  state  of  a  dynamic  system  at  a  time  t  is  represented  by  a 
random  variable  xt.  In  our  context,  xt  G  [l;k]  models  the  uncertainty  with  which  the 
robot  navigates  over  one  of  the  k  terrain  types.  Given  t+  1  preprocessed  vibration 
segments  uo:t  =  {wo,«i, . . recorded  by  accelerometer  sensors,  the  estimated 
target  distribution  is  determined  by  p(xt\uo:t)-  In  [9]  we  showed  that  p(xt\uo:t)  can 
be  formally  defined  as: 

p(xj  =  i\u0:t )  =  a,p(xt  =  i\ut)Y^p(x,  =  =  j)p{xt-\  =  j|«o;,-i). 

j 

Here,  p(xt\ut)  substitutes  the  measurement  probability  p(ut \xt)  and  represents  the 
probability  that  the  vibration  measurement  ut  can  be  observed  when  navigating  over 
a  certain  terrain  type  xt.  p(xt\ut )  is  derived  from  the  Bayes  inversion  p(ut\xt)  = 
p(xt\ut)j^j  assuming  that  p(xt)  is  distributed  uniformly.  Further  note  that  p(ut)  is 
constant  for  all  i  and  can  thus  be  included  in  the  normalizing  constant  at. 

The  transition  probability  p(xt \xt-i)  denotes  the  probability  that  the  robot  moves 
from  terrain  type  xt-\  =  j  to  xt  =  i.  Bayes  filters  model  the  dynamic  system  by  a 
first-order  Markov  process  assuming  that  the  information  provided  by  the  state  xt 
suffices  to  predict  future  states  without  considering  earlier  observations.  Our  ap¬ 
proach  is  based  on  the  heuristic  that  the  terrain  class  most  likely  does  not  change 
from  one  measurement  to  the  next.  Thus,  we  assign  a  relatively  large  value  v  to 
p(xt  =  i\xt-\  =  i).  p(xt  =  i \xt-i  =  j),  with  i  ^  j ,  is  derived  from  the  following  two 
heuristics:  First,  the  probability  p(xt  =  i\xt~i  =  j )  should  increase  with  the  proba¬ 
bility  to  confuse  class  i  with  class  j.  Second,  a  transition  from  state  xt-\  =  j  to  state 
xt  =  i  should  be  based  on  the  probability  to  predict  the  terrain  class  at  time  t  cor¬ 
rectly.  Both  probabilities  can  directly  be  estimated  from  the  confusion  matrix.  For 
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further  details,  we  refer  to  [9].  By  dynamically  changing  v,  the  probability  that  the 
system  remains  in  its  current  state,  we  obtain  an  approach  being  both  reactive  and 
stable  enough  to  detect  fast  terrain  transitions  and  selective  misclassifications.  In  our 
implementation,  v  is  either  increased  or  decreased  by  a  constant  factor  depending 
on  whether  the  current  prediction  equals  the  system  state  at  time  t  —  1 .  Upper  and 
lower  bounds  for  v  ensure  that  the  probability  of  a  state  transition  neither  becomes 
too  large  nor  too  small. 

For  the  definition  of  the  initial  probability  distribution  p(x o),  we  make  no  assump¬ 
tions  that  the  robot  is  placed  on  a  specific  terrain  type  at  time  t  =  0.  Hence,  p(x o)  is 
assumed  to  be  uniformly  distributed. 


3  Posterior  Probability  Estimation 


In  this  section,  we  briefly  describe  all  classifiers  that  have  been  embedded  into  our 
Bayes  filter  classification  approach.  Therefore,  we  explain  how  posterior  probabil¬ 
ities  p(x  =  i\u)  can  be  predicted  for  each  class  i  under  consideration.  Since  each 
classifier  features  different  characteristics  we  conclude  this  section  by  indicating  in 
which  situations  the  choice  of  a  certain  classifier  is  appropriate. 

k-nearest  neighbor  classification  (KNN)  [5]  determines  the  set  of  k-nearest-neighbors 
contained  in  a  training  set  to  a  testing  instance  u.  Then,  we  calculate  the  frequency 
of  occurrence  of  each  class  in  the  neighbor  set.  The  class  with  the  largest  frequency 
becomes  the  predicted  class  for  the  testing  instance  u.  The  posterior  probability 
p(x  =  i\u)  is  defined  as  the  ratio  between  the  number  of  occurrences  of  class  i  in  the 
neighbor  set  n\  and  the  number  of  considered  neighbors  k,  p(x  =  i\u)  =  j. 

The  multilayer  perceptron  (MLP)  [2]  is  an  instance  of  an  artificial  neural  network. 
It  consists  of  artificial  neurons  which  are  interconnected  in  a  well-defined  manner. 
These  neurons  are  arranged  in  three  different  layers:  in  an  input  layer,  a  hidden  layer, 
and  an  output  layer.  When  applying  an  input  u  to  the  network  input,  the  neurons  of 
the  hidden  layer  perform  a  weighted  sum  of  the  input  components:  neti  =  (w/,w). 
Here,  netj  denotes  the  net  activation  of  neuron  hi  and  w/  is  the  weight  vector  de¬ 
termining  the  specific  contribution  of  each  input  component  to  the  final  sum.  We 
then  apply  an  activation  function  fact,  typically  chosen  as  fact  =  tanh(neti ),  to  each 
net  activation  to  obtain  the  final  output  for  the  neurons  of  the  hidden  layer.  The  de¬ 
termination  of  the  net  activation  of  the  output  neurons  is  equivalent  to  the  ones  of 
the  hidden  layer  except  that  we  do  not  add  weighted  input  coefficients  but  weighted 
activations  of  the  hidden  neurons.  For  classification  problems,  the  activation  func¬ 
tion  of  the  output  neurons  is  replaced  by  the  softmax  function  which  takes  the  form 
fact  =  exp (netm) / £m/  exp (netmr),  where  netm  is  the  net  activation  for  output  neuron 
m.  Each  output  neuron  represents  a  certain  class  to  discriminate.  The  predicted  class 
is  the  one  which  is  represented  by  the  neuron  with  the  maximum  activation.  It  can  be 
shown  [2]  that  the  activations  can  directly  be  interpreted  as  posterior  probabilities. 
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Probabilistic  neural  networks  (PNN)  [13]  are  another  instance  of  artificial  neural 
networks.  In  the  training  phase,  scaled  training  patterns  are  inserted  into  a  ma¬ 
trix  Wc ,  c  G  [l;k],  according  to  the  class  c  they  belong  to.  Each  row  of  Wc  rep¬ 
resents  a  single  pattern.  The  scaling  is  performed  such  that  the  L2  norm  of  each 
training  instance  equals  to  one.  In  the  recall  phase,  the  same  scaling  is  applied 
to  the  test  vector  u.  For  each  class  c,  the  inner  product  between  each  pattern  w;- 
of  the  weight  matrix  Wc  and  the  query  u  is  determined  yielding  the  net  activation 
netcj.  The  net  activations  are  non-linearly  transformed  using  the  activation  function 
fact  {net cj)  =  exp {{neti  —  l)/cr2),  where  a  is  a  model  parameter  defining  the  size  of 
the  Gaussian  window.  For  each  class,  the  sum  over  all  transformed  net  activations  is 
determined,  sc  =  £/  fact  ( netcj ),  and  the  predicted  class  becomes  the  one  which  max¬ 
imizes  v  Given  that  the  probability  of  each  class  is  distributed  uniformly,  posterior 
probabilities  p(x  =  i\u)  can  then  be  defined  as  p{x  =  i\u)  =  {njl  Si)  /  s  f) , 

where  nc  is  the  number  of  training  instances  for  class  c. 

Given  two  classes  c\  and  C2  to  discriminate,  a  support  vector  machine  (SVM)  [4] 
establishes  a  separating  hyperplane  such  that  each  instance  of  the  first  class  resides 
in  one  subspace  and  each  instance  of  the  other  class  resides  in  the  other  subspace. 
To  increase  generalization  we  maximize  the  margin  which  is  the  distance  from  the 
hyperplane  to  the  instances  closest  to  it.  In  the  non-separable  case,  that  is,  if  no  hy¬ 
perplane  exists  which  separates  the  two  classes,  instances  of  class  c\  are  allowed 
to  reside  in  the  subspace  representing  class  C2  and  vice  versa.  However,  a  penalty 
term  is  added  for  each  non-separable  training  point.  Problems  exist,  which  are  not 
linearly  separable  in  the  original  space  spanned  by  the  training  data  but  which  be¬ 
come  linearly  separable  when  mapping  the  inputs  Ui  into  a  higher  dimensional  fea¬ 
ture  space,  z  =  0(m).  Using  the  ’’kernel  trick”  the  actual  mapping  does  not  have  to 
be  performed.  Instead,  we  exploit  the  fact  that  the  inner  product  of  basis  functions 
(j> (x)T (j> (y)  is  replaced  by  a  kernel  function  K(x,y).  In  our  experiments,  we  used  the 
radial  basis  function  kernel  defined  as  K{x,y)  =  exp  (—  \\x  —  y||2/(J2).  Multi-class 
classification  using  n  classes  is  achieved  by  establishing  n{n  —  l)/2  binary  classi¬ 
fiers  in  a  one-versus-one  classification  scheme.  Adopting  the  technique  of  [12],  a 
parameterized  sigmoid  function  is  applied  to  the  decision  value  of  each  binary  clas¬ 
sification  which  results  in  posterior  probabilities  of  both  classes.  Finally,  we  obtain 
the  posterior  for  each  class  /,  p(x  =  i\u),  using  the  pairwise  coupling  method  of  [19]. 

The  SVM-KNN  approach  [21]  combines  the  characteristics  of  both  the  KNN  and 
the  SVM  classifiers.  It  does  not  require  a  training  phase.  Instead,  predictions  are  per¬ 
formed  by  first  pruning  the  training  set.  Therefore,  the  k-nearest-neighbors  to  a  given 
query  u  are  identified.  Then,  a  multi-class  SVM  is  trained  online  using  the  pairwise 
distances  between  all  entries  of  the  union  of  the  query  and  the  neighbor  set.  Prior  to 
the  SVM  model  training,  these  distances  have  to  be  transformed  into  a  kernel  matrix. 
In  our  approach,  this  is  realized  by  applying  the  function  /(<%)  =  exp(— dfj/o2)  to 
the  pairwise  distances  dij.  As  distance  function  we  chose  the  L2  norm.  The  posterior 
probability  p{x\u)  is  then  obtained  by  applying  the  query  u  to  the  trained  SVM. 
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Classifier  selection  should  be  handled  with  care  since  each  approach  has  differ¬ 
ent  characteristics.  KNNs  and  PNNs  belong  to  the  class  of  lazy  learning  techniques. 
That  is,  all  computations  are  delayed  until  a  prediction  query  is  requested.  On  the 
one  hand,  this  renders  a  time-demanding  training  phase  unnecessary  which  is  ad¬ 
vantageous  if  the  underlying  phenomenon  changes  frequently.  On  the  other  hand, 
all  patterns  have  to  be  available  at  run-time  which  might  pose  a  problem  if  storage 
is  limited.  Given  that  the  acquired  training  set  consists  of  n  samples,  storage  require¬ 
ments  are  0(n-  d),  where  d  is  dimensionality  of  a  training  instance.  Furthermore, 
if  the  calculating  capacity  is  constrained  in  the  recall  phase,  the  desired  prediction 
frequency  might  not  be  accomplished  due  to  a  large  set  of  training  patterns.  For 
example,  when  using  the  KNN  classifier,  a  naive  approach  involves  0(n)  distance 
calculations  to  determine  the  k-nearest-neighbors.  Although  accelerating  data  struc¬ 
tures  like  M-trees  [20]  exist,  high-dimensional  nearest-neighbor  search  is  known  to 
be  a  non-trivial  task  suffering  from  the  curse  of  dimensionality. 

MLP  and  SVM  classifiers  typically  provide  compact  models,  resulting  in  a  fast  pre¬ 
diction  performance.  Model  training,  however,  is  computationally  much  more  de¬ 
manding  since  both  methods  iteratively  try  to  minimize  a  given  error  function.  The 
time  spent  on  choosing  a  classifier  with  a  good  generalization  behavior  is  signifi¬ 
cantly  increased  by  the  model  selection  process  which  has  to  consider  a  sufficiently 
large  set  of  candidate  model  parameter  settings. 

The  SVM-KNN  approach  is  characterized  by  an  involved  model  selection  and  test¬ 
ing  phase.  Since  a  class  prediction  also  requires  the  determination  of  the  k-nearest- 
neighbor  set  to  a  given  query,  the  training  set  has  to  be  present  at  run-time.  We 
note,  however,  that  this  approach  still  guarantees  predictions  performed  in  real-time. 
Hence,  we  included  SVM-KNN  classification  in  our  investigations. 

Table  1  summarizes  the  key  characteristics  for  the  proposed  classifiers:  the  respec¬ 
tive  model  parameter(s)  which  yielded  the  best  generalization  and  model  selection 
times  along  with  the  number  of  tested  model  candidates.  For  the  respective  best 
classification  model  we  further  present  the  training  time  using  data  contained  in  one 
fold  of  a  5 -fold  cross  validation  scheme,  the  average  testing  time  for  a  single  query, 
and  storage  requirements  (measured  in  kB).  We  performed  all  run-time  analyses  on 
a  Pentium D  3.0  GHz  desktop  PC.  For  the  storage  considerations,  we  represented 
each  floating  point  number  as  double ,  each  8  bytes  in  size. 


Table  1  The  respective  model  parameter(s),  the  number  of  considered  candidates  during  model 
selection,  model  selection  and  training  times,  prediction  complexity,  and  storage  requirements  of 
the  proposed  classifiers. 


KNN 

MLP 

PNN 

SVM 

SVM-KNN 

model  param. 

k  =  6 

hid.  =  96 

a  =  0.07 

C  =  9.05, a  =  0.02 

k  =  640,  C  =  9.05,  a  =  0.02 

model  sel.  cand. 

31 

8 

64 

196(14x14) 

k:  30;  C,a:  196 

model  sel.  time  (h) 

0:52:55 

32:34:00 

0:07:54 

24:55:47 

50:09:40 

training  time  (h) 

- 

1:08:43 

0:00:01 

0:00:54 

- 

testing  time  (ms) 

13.21 

0.02 

0.54 

1.07 

464.9 

storage  req.  (kB) 

763.5 

52.5 

763.5 

22.5 

763.5 
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Fig.  1  The  employed  terrain  types:  1:  indoor  floor,  2:  asphalt,  3:  gravel,  4:  grass,  5:  boule  court. 


4  Experimental  Setup 

In  our  experiments,  an  Xsens  MTi  altitude  and  heading  reference  system  was 
mounted  on  an  aluminum  plate  on  top  of  our  RWI ATRV- Jr  outdoor  robot  to  measure 
vibration  signals  in  left-right  direction  at  100  Hz.  During  data  acquisition,  the  robot 
navigated  over  five  different  terrains  (Fig.  1):  indoor  PVC  floor,  asphalt,  gravel, 
grass,  and  clay  (the  surface  of  a  boule  court).  To  not  constrain  the  model  to  work  at 
a  certain  driving  speed,  we  varied  the  speed  between  0.2,  0.4,  and  0.6  m/s.  In  total, 
the  dataset  consists  of  7635  patterns,  corresponding  to  approximately  1.5  hours  of 
robot  navigation. 

We  performed  individual  terrain  classifications  using  vibration  data  acquired  dur¬ 
ing  1.28  s  of  robot  travel.  For  two  consecutive  segments  we  permit  an  overlap  of  28 
samples  to  achieve  a  prediction  frequency  of  1  Hz.  The  combination  of  terrain  class 
predictions  was  realized  by  our  adaptive  Bayes  filter  approach  introduced  in  Sect.  2. 
To  quantify  the  performance  of  the  latter,  we  applied  the  following  evaluation  pro¬ 
cedure:  We  assembled  consecutive  vibration  segments  representing  the  same  terrain 
type  to  give  a  travel  distance  of  constant  length.  Then,  assembled  segments  of  vary¬ 
ing  terrain  types  were  grouped  together  yielding  the  final  test  set.  In  different  ex¬ 
periments,  we  varied  the  distance  covered  by  a  robot  before  it  reaches  a  new  terrain 
class.  This  distance  is  denoted  as  the  travel  distance  d  (measured  in  meters)  in  the 
following.  In  each  experiment,  d  was  chosen  from  the  set  d  E  {0;4;  8;  12;  16}.  The 
0  m  experiment  describes  the  worst  case  scenario  for  approaches  based  on  tempo¬ 
ral  filtering.  Here,  single  segments  of  varying  terrain  classes  are  concatenated,  each 
representing  data  acquired  during  1  s  of  robot  travel.  Since  the  robot  speed  varies 
between  0.2,  0.4,  and  0.6  m/s,  this  experiment  includes  travel  distances  of  0.2,  0.4, 
and  0.6  m.  Note  that  according  to  the  confusion  matrix,  certain  terrain  transitions 
are  easier  to  detect  than  other  ones.  Hence,  the  results  depend  on  the  order  in  which 
assembled  terrain  segments  of  varying  terrain  type  are  presented.  We  minimized 
this  effect  by  randomly  permuting  this  order  and  averaging  the  classification  results 
determined  after  20  reruns  of  a  particular  experiment. 

As  quality  measure  we  used  the  true  positive  rate  (TPR).  It  is  the  ratio  (measured 
in  per  cent)  of  the  number  of  correct  predictions  for  which  the  predicted  class  xt 
equals  the  actual  class  xt  and  the  number  of  instances  contained  in  the  test  set.  We 
derived  the  prediction  performance  using  5-fold  cross  validation  and  averaging  the 
true  positive  rate  over  all  five  folds. 
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Table  2  Prediction  performance  (in  %)  for  varying  classifiers  and  travel  distances  (dist.)  using 
single  observation-based  (SO)  and  adaptive  Bayes  filter-based  (AB)  terrain  classification. 


dist.  (m) 

0 

4 

8 

12 

16 

approach 

SO 

AB 

SO 

AB 

SO 

AB 

SO 

AB 

SO 

AB 

SVM-KNN 

89.1 

89.1 

89.8 

93.3 

89.8 

94.8 

91.8 

96.7 

91.8 

97.0 

SVM 

88.5 

88.4 

89.3 

92.4 

89.3 

94.2 

91.0 

95.8 

90.9 

96.1 

MLP 

86.7 

86.7 

87.4 

90.6 

87.4 

91.0 

89.0 

92.4 

88.9 

92.1 

KNN 

80.6 

79.7 

81.2 

83.6 

81.1 

85.0 

82.9 

88.5 

82.7 

88.4 

PNN 

79.2 

79.1 

80.2 

82.4 

80.1 

82.7 

83.7 

86.1 

82.8 

85.3 

(a)  (b) 


Fig.  2  (a)  True  positive  rates  for  the  adaptive  Bayes  approach  and  (b)  the  relative  increase  of 
classification  performance  for  the  adaptive  Bayes  approach  related  to  single  observation-based 
classification  when  varying  the  classifier  and  the  travel  distance. 


5  Experimental  Results 

Table  2  shows  the  results  for  the  proposed  classifiers  when  using  single  observations 
(SO)  and  Bayes-filtered  posterior  probabilities  of  recent  predictions  (AB).  Note  that 
the  true  positive  rate  for  the  single  observation-based  approach  differs  between  vary¬ 
ing  experiments.  This  is  due  to  the  model  evaluation  procedure  introduced  in  the 
previous  section  which  selects  a  varying  test  set  for  each  travel  distance. 

Related  to  both  the  SO  and  the  AB  approach,  the  SVM-KNN  technique  yields  the 
best  prediction  performance,  followed  by  S  VM,  MLP,  KNN,  and  PNN  classification 
(Fig.  2(a)).  The  differences  in  the  true  positive  rates  of  the  applied  classifiers  proved 
to  be  statistically  significant,  using  a  two-tailed  t-test  at  a  significance  level  of  1%. 
The  combined  support  vector  machine  and  k-nearest-neighbor  approach  benefits 
from  the  reduced  training  set  resulting  in  another  configuration  of  the  separating 
hyperplane.  This  hyperplane  results  in  a  higher  generalization  as  compared  to  the 
one  established  by  the  SVM  approach  which  uses  all  training  patters  at  once.  The 
classification  performance  of  each  classifier  is  also  reflected  in  the  increase  of  the 
true  positive  rate  obtained  when  using  the  adaptive  Bayes  technique  in  comparison 
with  the  single  observation  approach  (Fig.  2(b)):  the  larger  the  true  positive  rate 
for  a  given  classifier,  the  larger  the  benefit  of  using  temporally-filtered  predictions. 
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This  statement  holds  for  all  classifiers  but  the  KNN  approach:  Here,  the  adaptive 
Bayes  technique  results  in  the  largest  increase  of  the  true  positive  rate.  Investigations 
revealed  that  in  the  case  of  a  misclassification  the  posterior  probability  of  the  erro¬ 
neously  predicted  class  pe  was  rather  small  on  average,  pe  =  0.64  ±0.19,  in  compar¬ 
ison  with  the  ones  obtained  for  a  correctly  predicted  class,  pc  =  0.87  ±0.18.  Hence, 
the  transition  into  another,  that  is,  erroneous  system  state  becomes  less  likely  if  the 
system  previously  resided  in  the  correct  state,  no  terrain  transition  occurred,  but  the 
classifier  erroneously  estimated  the  wrong  terrain  class.  We  obtained  similar  results 
using  SVM  and  SVM-KNN  classifiers:  Here,  the  posterior  probability  of  erroneous 
predictions  was  pe  =  0.69  ±0.1 8  and  pe  =  0.66  ±0.1 8,  respectively.  PNN  and  MLP 
classifiers  perform  wrong  predictions  with  higher  confidences:  pe  =  0.87  ±0.17  and 
pe  =  0.84  ±0.17,  respectively.  Referring  to  Fig.  2(b),  this  is  another  explanation  for 
the  smaller  increase  in  TPR  compared  to  other  classifiers  which  provide  more  uncer¬ 
tain  erroneous  predictions.  In  addition,  we  observed  that  the  posterior  probabilities 
of  correct  predictions  were  larger  than  0.87  on  average  for  all  classifiers. 

Fig.  2(b)  shows  the  true  positive  rates  when  applying  the  Bayes  filter  technique  to 
the  proposed  classifiers.  It  reveals  that  all  classifiers  are  not  influenced  by  high- 
frequent  terrain  changes  in  a  significant  manner  when  embedded  into  our  Bayes 
filter  prediction  scheme. 


6  Conclusion 

In  this  paper,  we  systematically  investigated  the  applicability  of  several  posterior 
probability  estimation  techniques  in  the  context  of  terrain  classification  based  on 
temporal  coherence.  We  exploited  temporal  coherence  using  a  Bayes  filter  approach 
which  takes  several  recent  terrain  class  predictions  into  account.  Depending  on  the 
choice  of  the  classifier  and  the  distance,  a  robot  has  to  navigate  over  a  certain  terrain 
type  before  a  terrain  transition  occurs,  the  classification  performance  increased  by 
up  to  6.9%.  This  number  denotes  the  increase  of  classification  performance  related 
to  a  classification  approach  based  on  individual  observations  only.  We  showed  that 
the  Bayes  filtering  approach  was  nearly  always  superior  to  the  single-observation 
approach  with  the  only  exception  of  the  KNN  classifier  at  a  travel  distance  of  less 
than  or  equal  to  0.6  m.  The  significantly  best  experimental  results  were  obtained 
using  a  combined  support  vector  machine  and  k-nearest  neighbor  approach  which 
has  not  been  employed  in  the  domain  of  terrain  classification  so  far.  Further  in¬ 
vestigation  revealed  that  the  various  classifiers  did  not  only  differ  in  classification 
performance  but  also  in  the  confidence  of  erroneous  predictions.  In  the  context  of 
Bayesian  filtering  this  is  an  important  issue  since  a  decrease  in  this  confidence  re¬ 
sults  in  a  decreased  influence  of  wrong  predictions  on  the  final  classification. 

As  a  further  contribution  we  examined  the  proposed  classifiers  with  respect  to  their 
limiting  factors  such  as  storage  requirements,  prediction  times,  model  generation 
times,  and  model  selection  times.  The  results  provide  criteria  for  choosing  an  ap¬ 
propriate  classifier  for  a  variety  of  hardware  configurations. 
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Abstract  Many  important  scientific  studies,  particularly  those  involving  climate 
change,  require  weather  measurements  from  the  ice  sheets  in  Greenland  and  Antarc¬ 
tica.  Due  to  the  harsh  and  dangerous  conditions  of  such  environments,  it  would  be 
advantageous  to  deploy  a  group  of  autonomous,  mobile  weather  sensors,  rather  than 
accepting  the  expense  and  risk  of  human  presence.  For  such  a  sensor  network  to 
be  viable,  a  method  of  navigating,  and  thus  a  method  of  terrain  assessment,  must  be 
developed  that  is  tailored  for  arctic  hazards.  An  extension  to  a  previous  arctic  terrain 
assessment  method  is  presented,  which  is  able  to  produce  dense  terrain  slope  esti¬ 
mates  from  a  single  camera.  To  validate  this  methodology,  a  set  of  prototype  arctic 
rovers  have  been  designed,  constructed,  and  fielded  on  a  glacier  in  Alaska. 


1  Introduction 

An  important  aspect  of  autonomous  field  robotic  navigation  is  terrain  assessment. 
When  an  autonomous  agent  is  deployed  in  unstructured,  natural  environments,  the 
exact  condition  of  the  environment  cannot  be  known  ahead  of  time.  Instead,  the 
agent  must  assess  the  terrain  condition  locally,  then  revise  its  navigation  plan  as  nec¬ 
essary.  Much  of  the  literature  in  the  area  of  terrain  assessment  focuses  on  desert  en¬ 
vironments,  arising  from  the  needs  of  NASA’s  Mars  rovers  and  the  first  two  DARPA 
Grand  Challenge  events  [2,  4,  6]. 

In  contrast,  little  work  has  focused  on  navigating  in  arctic  environments,  despite 
the  scientific  importance  of  such  areas.  Though  many  scientists  believe  the  condi¬ 
tion  of  the  giant  ice  sheets  in  Greenland  and  Antarctica  are  a  key  to  understanding 
global  climate  change,  there  is  still  insufficient  data  to  accurately  predict  the  future 
behavior  of  those  ice  sheets.  While  satellites  are  able  to  map  the  ice  sheet  elevations 
with  increasing  accuracy,  data  about  general  weather  conditions  (i.e.  wind  speed, 
barometric  pressure,  etc.)  must  be  measured  at  the  surface. 


1 


2 


Stephen  Williams  and  Ayanna  M.  Howard 


In  order  to  obtain  measurements,  human  expeditions  must  be  sent  to  these  remote 
and  dangerous  areas.  Alternatively,  a  group  of  autonomous  robotic  rovers  could  be 
deployed  to  these  same  locations,  mitigating  the  cost,  effort,  and  danger  of  human 
presence.  For  this  to  be  a  viable  solution,  a  method  for  navigating  in  the  arctic, 
and  thus  of  assessing  arctic  terrain,  must  be  developed.  This  paper  extends  the  work 
presented  in  [10],  creating  dense  slope  estimates  of  the  terrain  from  a  single  camera. 
Sect.  2  briefly  describes  the  types  of  terrain  likely  to  be  encountered  in  the  arctic 
regions  of  Greenland  or  Antarctica.  Sect.  3  details  the  slope  assessment  algorithm. 
A  set  of  prototype  arctic  rovers  have  been  designed  and  constructed.  A  description 
of  the  units  and  the  field  tests  conducted  is  presented  in  Sect.  4.  The  slope  estimate 
results  from  the  field  tests  are  shown  in  Sect.  5.  Finally,  conclusions  and  future  work 
are  discussed  in  Sect.  6. 


2  Environment 

Despite  being  covered  by  snow,  arctic  regions  present  a  large  assortment  of  terrain 
challenges,  a  small  sample  of  these  are  shown  in  Fig.  1.  Large  quantities  of  fresh 
surface  snow  can  be  present  during  certain  times  of  the  year.  This  fresh  snow  is 
soft,  creating  a  potential  sinking  hazard  for  wheeled  vehicles.  The  soft  snow  is  also 
more  readily  melted,  causing  a  dimpling  of  the  surface,  referred  to  as  “sun  cups,” 
which  can  span  0.5  meters  or  more.  Over  time  the  winds  harden  the  snow  surface 
making  it  more  amenable  to  locomotion.  However,  these  same  winds  also  sculpt  the 
snow  into  dune-like  structures  that  can  be  as  large  as  one  meter,  again  impeding 
motion.  The  underlying  ice  sheet  is  also  responsible  for  several  types  of  terrain 
hazards.  As  the  ice  sheet  flows,  forces  build  due  to  differential  velocities  of  different 
ice  sections.  These  forces  can  cause  nearly  vertical  fractures  in  the  ice  known  as 
crevasses.  Crevasses  can  be  as  deep  as  30  meters  and  are  often  covered  with  snow, 
making  their  detection  all  the  more  difficult.  A  narrow  crevasse  is  shown  in  Fig. 
1(c),  which  becomes  obscured  by  snow  toward  the  top  of  the  image.  In  the  thinner 
regions  of  the  ice  sheet,  the  surface  is  affected  by  the  underlying  mountains,  causing 
significant  local-scale  elevation  changes.  Even  on  seemingly  flat  terrain,  the  actual 
snow  depth  can  change  drastically,  with  the  ice  sheet  exposed  in  some  locations, 
and  covered  by  several  meters  of  snow  in  others. 


3  Slope  Assessment 

The  slope  estimation  technique  presented  in  [10]  divided  the  image  into  large  blocks 
in  which  the  surface  texture  was  analyzed.  A  single  slope  estimate  was  produced 
which  was  aligned  with  the  predominate  surface  texture  direction.  The  resulting 
estimates,  although  noisy,  were  shown  to  be  representative  of  the  actual  slope  within 
a  simulated  environment,  and  sufficient  input  for  a  slope-avoidance  control  scheme. 
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Fig.  1  Images  from  the  top  of  Mendenhall  Glacier,  Alaska  showing  (a)  visible  sun  cups,  (b)  a  large 
section  of  exposed  ice  sheet,  and  (c)  a  small  crevasse  visible  through  the  snow,  (d)  An  image  from 
an  analogous  site  on  the  Arikaree  Glacier,  Colorado  showing  the  potential  steep  slopes  in  glacial 
terrain. 


Presented  below  is  an  improvement  upon  this  algorithm  which  results  in  a  set  of 
dense  slope  estimates  for  the  scene. 

In  images  of  arctic  terrain,  the  surface  texture  has  very  low  contrast.  In  order 
to  analyze  this  texture,  the  foreground  contrast  must  first  be  boosted.  An  adaptive, 
nonlinear  preprocessing  stage  has  been  introduced,  originally  formulated  to  enhance 
x-ray  images  and  CT  scans  [9].  Contrast  limited  adaptive  histogram  equalization 
(CLAHE)  separates  the  image  into  different  contextual  regions.  Within  each  region, 
a  histogram  equalization  procedure  is  calculated.  To  prevent  over-enhancement  of 
local  areas,  a  contrast  limit  is  imposed.  In  effect,  this  applies  an  upper  bound  to  the 
slope  of  the  gradient  at  a  specific  location,  resulting  in  smoothly  varying  contrast. 

However,  the  presence  of  image  distractors,  such  as  background  mountains,  have 
an  adverse  effect  on  both  the  contrast  enhancement  and  the  subsequent  slope  es¬ 
timates.  A  method  of  histogram  thresholding,  presented  in  [10]  has  been  applied 
here.  It  is  assumed  that  the  majority  of  the  image  is  filled  with  the  snowy  region. 
Consequently,  in  the  histogram  of  the  image,  the  largest  peak  should  be  associated 
with  the  grayscale  values  of  the  snow.  An  adaptive  threshold  based  on  the  bound¬ 
aries  of  this  peak  produces  an  image  mask  which  can  effectively  separate  the  region 
of  interest  from  unwanted  objects  and  areas.  Fig.  2  shows  the  results  of  the  mask 
and  contrast  enhancement  on  a  single  exemplar  glacial  image.  For  the  first  time,  the 
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(a)  (b)  (c) 


Fig.  2  (a)  An  image  from  Mendenhall  Glacier,  Alaska,  (b)  the  mask  produced  by  the  histogram 
threshold  procedure  which  separates  the  region  of  interest  from  the  background  objects,  and  (c)  the 
results  of  the  CLAHE  contrast  enhancement  of  the  masked  image.  For  the  first  time,  the  underlying 
structure  of  the  scene  is  clearly  visible. 


underlying  scene  structure  is  clearly  visible.  Although,  the  image  noise  has  clearly 
been  amplified  as  well. 

The  enhanced  snow  texture  exhibits  small-scale  directional  details,  which  are  vi¬ 
sually  similar  to  those  of  fingerprints.  In  the  area  of  fingerprint  enhancement,  where 
it  is  desired  to  find  and  follow  the  small  ridge  details  of  a  print,  it  is  common  to 
create  an  orientation  image  to  aid  in  the  processing  [3,  5].  A  least  square  estimate 
procedure  for  calculating  this  orientation  is  presented  in  [3].  In  a  similar  fashion,  the 
final  slope  estimate  is  produced  by  finding  the  least  square  estimate  of  the  dominant 
Fourier  spectrum  direction  in  a  neighborhood  around  each  pixel. 

To  calculate  the  orientation  of  a  given  pixel,  (/,/),  the  image  gradient  within  a 
neighborhood  of  that  pixel  is  first  calculated.  Then  the  two  component  vectors,  vx 
and  vy  ,  are  generated,  as  described  in  Equations  1  and  2.  The  orientation,  0 ,  is 
then  defined  as  the  least  squares  solution  to  Equation  3.  The  entire  slope  calculation 
process  can  be  processed  in  real-time. 


Vx(*',;)=  E  2  dx(u,v)dy(u,v) 

a) 

neighborhood 

Vy  0,  j)  =  E  dx  (M> v)  “  dy  (M> v) 

(2) 

neighborhood 

’"-i  ^ 
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4  Field  Tests 

To  validate  the  slope  assessment  algorithm,  three  prototype  mobile  weather  sen¬ 
sor  nodes  were  constructed.  The  rovers,  referred  to  as  “Sno-motes”,  were  subse¬ 
quently  fielded  on  a  frozen  lake  near  Wapakoneta,  Ohio  and  on  Mendenhall  Glacier 
in  Juneau,  Alaska. 
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4.1  Sno-mote  Mkl 


A  1/10  scale  snowmobile  chassis  was  selected  for  the  prototype  platform,  endow- 
in g  the  rover  with  an  inherent  all-terrain  drive  system.  The  platform  was  modified  to 
include  an  ARM-based  processor  running  a  specialized  version  of  Linux.  The  moth¬ 
erboard  offered  several  serial  standards  for  communication,  in  addition  to  wifi  and 
bluetooth.  A  daughterboard  provided  an  ADC  unit  and  PWM  outputs  for  controlling 
servos.  The  drive  system  was  modified  to  accept  PWM  motor  speed  commands,  and 
the  steering  control  was  replaced  with  a  high-torque  servo.  For  ground  truth  posi¬ 
tion  logging,  a  GPS  unit  connects  to  the  processor  via  the  bluetooth  interface,  while 
robot  state  and  camera  images  are  sent  directly  to  an  external  control  computer  via 
the  wifi  link.  To  simulate  the  science  objectives  of  the  mobile  sensor,  a  weather- 
oriented  sensor  suite  was  added  to  the  rover.  The  deployed  instrument  suite  includes 
sensors  to  measure  temperature,  barometric  pressure,  and  relative  humidity. 


4.2  Alaska  Test  Site 


The  three  “Sno-mote”  platforms  and  related  equipment  were  shipped  to  Juneau, 
Alaska  for  field  testing.  Two  potential  test  sites  were  selected  based  on  the  rele¬ 
vance  of  weather  data,  the  similarity  of  the  terrain  to  arctic  conditions,  and  logistics. 
Site  1,  Lemon  Creek  Glacier,  has  been  the  subject  of  annual  mass  balance  mea¬ 
surements  since  1953  as  part  of  the  Juneau  Icefield  Research  Program  (JIRP)  [8], 
making  weather  measurements  in  this  area  particularly  relevant.  The  second  site, 
Mendenhall  Glacier,  is  one  of  Alaska’s  most  popular  tourist  attractions  [7].  The  cur¬ 
rent  public  interest  of  this  particular  site  makes  additional  information  valueable. 
Both  sites  are  only  accessible  via  helicopter. 

Helicopter  travel  to  glacial  areas  is  heavily  dependent  on  the  weather  conditions, 
particularly  low  cloud  deck  heights.  This  presents  a  dangerous  “white  out”  situation 
for  the  helicopter  pilot  in  which  the  snow-covered  peaks,  the  ground  of  the  landing 
site,  and  the  sky  are  all  indistinguishably  white.  During  the  course  of  our  flight  to 
Lemon  Creek  such  a  situation  was  determined  to  exist,  forcing  the  group  to  abandon 
the  site  for  the  day.  A  few  images  were  acquired  from  this  site  before  departure,  a 
sample  of  which  is  shown  in  Fig.  3(a). 

The  weather  conditions  preventing  travel  to  either  of  the  test  site  finally  lifted 
on  Day  4,  allowing  travel  to  Mendenhall.  The  site  surface  is  visually  flat  and  cov¬ 
ered  with  snow,  though  there  are  sections  of  the  terrain  where  the  underlying  ice 
sheet  is  exposed.  Despite  the  flat  appearance,  the  snow  varied  in  depth  from  a  few 
centimeters  to  over  a  meter.  This  snow  was  deposited  recently  and  was  quite  soft. 
Upon  arrival  at  the  site,  a  test  area  was  explored  with  ice-axes  to  ensure  it  was  safe. 
Cracks  in  the  underlying  ice,  called  crevasses,  are  often  completely  concealed  by 
surface  snow. 

The  rovers  were  driven  manually  to  assess  the  mobility  performance  in  the  differ¬ 
ent  snow  conditions  present.  During  these  traverses  it  was  discovered  that  the  plat- 


6 


Stephen  Williams  and  Ayanna  M.  Howard 


^  --a  muwm 

% 

(a)  (b) 


Fig.  3  (a)  A  sample  of  images  acquired  from  Lemon  Creek  Glacier,  Alaska  before  weather  condi¬ 
tions  forced  the  site  to  be  abandoned  for  the  day.  Some  of  the  underlying  mountain  range  is  visible 
though  the  glacier  surface,  but  the  terrain  is  predominately  white,  with  the  slope  characteristics 
almost  invisible,  (b)  A  Sno-motes  deployed  at  Mendenhall  Glacier  in  Juneau,  Alaska. 


form  suffered  from  stability  issues.  Due  to  the  narrow  track  footprint  in  the  rear,  the 
chassis  would  often  roll  sideways  when  attempting  to  navigate  perturbations  in  the 
snow  surface.  Additionally,  the  snowmobile  would  sink  in  the  fresh  snow,  causing 
the  DC  drive  motor  to  stall  from  excess  torque.  Due  to  the  chassis  limitations,  a  set 
of  short  traverses  were  performed  in  selected  locations.  During  these  traverses,  the 
local  temperature,  barometric  pressure,  relative  humidity,  GPS  location,  and  camera 
images  were  all  logged  at  2  Hz  and  timestamped  to  ensure  proper  off-line  recon¬ 
struction  and  analysis.  Fig.  1(a)  -  1(c)  show  some  sample  images  acquired  at  the 
Mendenhall  test  site. 


4.3  Sno-mote  Mk2 

The  main  reason  tracked  vehicles  are  used  for  snow  traversal  is  the  large  area  of  the 
track  distributes  the  vehicle  weight,  allowing  it  to  “float”  on  the  surface.  Possibly 
the  most  capable  snow  vehicle  is  the  “Alpina  Sherpa”  [1],  which  was  designed  with 
two  tracks  to  further  reduce  the  applied  pressure.  Due  to  the  discovered  mobility 
issues  with  the  original  platform,  a  set  of  chassis  modifications  were  designed  and 
implemented,  with  inspiration  taken  from  the  “Sherpa”.  The  original  front  suspen¬ 
sion  mechanism  was  replaced  by  a  passive  double- wishbone  system,  increasing  the 
ski-base  over  30%.  The  rear  track  system  was  replaced  with  a  custom,  dual-track  de¬ 
sign,  which  both  widened  the  rear  footprint  and  effectively  doubled  the  snow  contact 
surface  area.  A  500  W  brushless  motor  and  high-current  speed  controller  drive  the 
new  track  system.  The  overall  increase  in  the  platform  width  drastically  improved 
the  platform’s  stability  and  role  characteristics.  Fig.  4  shows  the  modified  chassis, 
as  deployed  on  a  frozen  lake  near  Wapakoneta,  Ohio. 
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Fig.  4  The  modified  chassis 
of  the  second  generation 
“Sno-mote”  deployed  near 
Wapakoneta,  Ohio.  Clearly 
visible  is  the  new  dual-track 
drive  system. 


4.4  Ohio  Test  Site 

A  test  site  near  Wapakoneta,  Ohio  was  selected  to  verify  the  performance  of  the  new 
chassis.  The  site  was  blanketed  with  8-12  inches  of  fresh  snow  next  to  a  frozen  lake. 
Several  long  traverses  were  conducted,  which  transitioned  from  land  to  lake  several 
times.  During  these  traverses,  the  GPS  location  and  camera  image  were  logged  at 
15  Hz  and  timestamped.  The  lake  bank  consisted  of  irregularly  spaced  large  rocks, 
between  which  large  amounts  of  snow  had  collected,  forming  a  drivable  incline 
between  10°  and  30°. 

The  improved  chassis  performed  well  during  the  tests,  never  rolling,  even  when 
negotiating  a  path  between  rocks  up  a  20°  slope.  While  it  was  still  possible  for 
the  chassis  to  loose  traction,  especially  in  very  soft  snow  or  up  steep  inclines,  the 
new  drive  motor  was  never  forced  into  a  stall  condition.  However,  one  unexpected 
observation  from  these  test  was  that  the  control  computer,  which  consisted  of  a 
consumer-grade  laptop,  ceased  to  operate  when  its  temperature  dropped  below  20° F. 
The  “Sno-mote”  control  computer  and  hardware  were  unaffected  by  the  cold. 


5  Results 

The  slope  estimation  algorithm,  presented  in  Sect.  3,  has  been  applied  to  the  images 
acquired  during  the  field  tests.  For  comparison,  the  original  slope  estimate  tech¬ 
nique,  presented  in  [10],  has  also  been  calculated.  The  results  of  both  techniques 
are  shown  in  Fig.  5.  While  both  methods  show  the  general  regions  in  front  of  the 
camera  to  be  flat,  the  denser  information  of  the  proposed  method  is  better  able  to 
capture  the  smaller  scale  surface  trends.  This  is  particularly  evident  in  the  lower  left 
of  the  images  in  Fig.  5(a)  and  5(b).  The  proposed  method  accurately  indicates  the 
slopes  around  a  depression  in  the  snow,  whereas  the  original  method  provides  only 
a  single,  slightly  upward  slope  indication.  Also,  the  new  method  is  able  to  handle 
the  ice  as  well  as  the  snow  image  textures.  The  original  method  provides  spurious 
measurements  in  the  ice  regions  that  do  not  reflect  the  true  terrain  grade.  In  the  Ohio 
images,  Fig.  5(c)  and  5(d),  the  original  method  completely  ignores  the  small  dune 
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Fig.  5  (a,  b)  An  image  taken  during  the  field  tests  at  Mendenhall  Glacier,  Alaska  has  been  pro¬ 
cessed  by  the  original  slope  estimation  procedure  and  the  proposed  method.  Similarly,  (c,  d)  an 
image  from  the  Ohio  field  tests  has  been  processed  by  both  methods.  The  proposed  method  pro¬ 
duces  much  denser  estimates  that  are  better  able  to  capture  smaller  scale  surface  details. 


structures,  whereas  the  new  method  does  indicate  the  sloping  regions  on  either  side 
of  both  structures. 

Examples  of  processed  arctic  terrain  are  provided  in  Fig.  6.  In  the  first  image 
set  from  Lemon  Creek  Glacier,  the  terrain  grade  in  the  original  image  is  virtually 
invisible.  Yet,  the  estimate  process  is  able  to  provide  dense  estimates,  even  in  the 
areas  that  originally  seemed  uniformly  white.  The  second  pair  of  images  illustrates 
a  large  crevasse  on  Mendenhall  Glacier.  The  slope  estimation  process  is  able  to 
handle  both  the  snow  and  exposed  ice  textures  without  modification.  The  estimates 
provided  clearly  show  the  snow  and  ice  sloping  into  the  mouth  of  the  crevasse,  while 
a  relatively  safe  area  exists  in  the  far  left. 


6  Conclusions 


When  navigating  in  arctic  terrain,  the  local  terrain  slope  is  an  important  factor  when 
determining  traversabilty.  Vehicle  limitations  may  impose  terrain  grade  limits,  or 
local  areas  of  steep  decent  may  imply  hazards.  A  purely  visual  slope  estimation 
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(a) 


(b) 


(c) 


Fig.  6  Images  (a)  and  (c)  show  an  original  image  from  Lemon  Creek  glacier,  and  the  enhanced  and 
processed  version,  respectively.  Despite  the  surface  texture  being  nearly  invisible  in  the  original  im¬ 
age,  the  slope  estimation  process  is  able  to  produce  dense,  visually  consistent  slope  measurements. 
Images  (b)  and  (d)  show  an  image  of  a  large  crevasse  at  Mendenhall  glacier  and  the  resulting  slope 
estimates.  The  slope  profiles  in  (d)  clearly  show  the  elevation  changes  at  the  edge  of  the  crevasse. 
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technique  has  been  presented  which  creates  dense  slope  estimates  from  a  single 
image,  even  in  the  inherently  low  contrast  environment  of  the  arctic. 

A  set  of  prototype  rovers  have  been  constructed,  based  upon  a  snowmobile  de¬ 
sign,  and  fielded  on  a  frozen  lake  in  Ohio,  as  well  as  on  Mendenhall  Glacier  in 
Juneau,  Alaska.  A  sample  of  the  slope  estimation  results  from  these  field  tests  have 
been  included.  Qualitatively,  the  results  appear  consistent  with  human  perceived 
slope  determinations,  and  are  an  improvement  over  a  previously  presented  method, 
both  in  terms  of  estimate  density  and  slope  misclassification. 

In  the  future,  these  slope  estimates  will  be  developed  into  a  full  traversabilty 
assessment,  were  drivable  terrain  is  classified  as  safe  and  terrain  hazards  are  labeled. 
This  would,  in  turn,  be  used  by  the  navigation  and  path  planning  system  to  plot  safe 
trajectories. 
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Abstract 

The  approach  investigated  in  this  work  employs  three-dimensional  LADAR 
measurements  to  detect  and  track  pedestrians  over  time.  The  sensor  is  em¬ 
ployed  on  a  moving  vehicle.  The  algorithm  quickly  detects  the  objects  which 
have  the  potential  of  being  humans  using  a  subset  of  these  points,  and  then 
classifies  each  object  using  statistical  pattern  recognition  techniques.  The  al¬ 
gorithm  uses  geometric  and  motion  features  to  recognize  human  signatures. 
The  perceptual  capabilities  described  form  the  basis  for  safe  and  robust  nav¬ 
igation  in  autonomous  vehicles,  necessary  to  safeguard  pedestrians  operating 
in  the  vicinity  of  a  moving  robotic  vehicle. 


1  INTRODUCTION 

The  ability  to  avoid  colliding  with  other  objects  is  essential  in  autonomous 
vehicles,  especially  in  cases  where  they  operate  in  close  proximity  to  people. 
The  timely  detection  of  a  pedestrian  makes  the  vehicle  aware  of  a  potential 
danger  in  its  vicinity,  and  allows  it  to  modify  its  course  accordingly.  There  is 
a  large  body  of  work  done  using  laser  line  scanners  as  the  primary  sensor  for 
pedestrian  detection  and  tracking.  In  our  group,  we  have  developed  detection 
and  tracking  systems  using  SICK™  laser  line  scanners;  these  implementa¬ 
tions  work  well  in  situations  where  the  ground  is  relatively  flat  [5].  However, 
a  3D  LADAR  (i.e.  one  who  produces  a  set  of  3D  points,  or  point  cloud) 
captures  a  more  complete  representation  of  the  environment  and  the  objects 
within  it.  In  [6],  we  presented  an  algorithm  that  detects  pedestrians  from  3D 
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data.  Its  main  improvement  over  the  version  with  2D  data  was  that  it  con¬ 
structs  a  ground  elevation  map,  and  uses  it  to  eliminate  ground  returns.  This 
allows  pedestrian  detection  even  when  the  surrounding  ground  is  uneven.  To 
classify  the  humans  the  algorithm  uses  motion,  size,  and  noise  features.  Per¬ 
sons  are  classified  well  as  long  as  they  are  moving.  However,  there  are  still 
too  many  false  positives  when  classifying  stationary  humans. 

In  this  paper,  we  describe  a  strategy  to  detect  and  classify  humans  using 
the  full  3D  point  cloud  of  the  object.  This  will  improve  the  classification  of 
both  moving  and  static  pedestrians.  However,  the  improvement  will  be  most 
significant  for  static  humans.  The  algorithm  quickly  detects  the  objects  that 
have  the  potential  of  being  humans  using  a  subset  of  the  point  cloud,  and 
then  classifies  each  object  using  statistical  pattern  recognition  techniques. 
We  present  experimental  results  of  detection  performance  using  3D  LADAR, 
which  were  obtained  from  field  tests  performed  on  a  Demo  III  XUV  [7]. 


2  RELATED  WORK 

Some  researchers  have  applied  classification  techniques  to  the  detection  and 
tracking  problem.  The  approach  reported  in  [1]  applies  AdaBoost  to  train  a 
strong  classifier  from  simple  features  of  groups  of  neighboring  points.  This 
work  focuses  on  2D  range  measurements.  Examples  using  three-dimensional 
data  include  [4],  where  3D  scans  are  automatically  clustered  into  objects 
and  modeled  using  a  surface  density  function.  A  Bhattacharya  similarity 
measure  is  optimized  to  register  subsequent  views  of  each  object  enabling 
good  discrimination  and  tracking,  and  hence  detection  of  moving  objects. 
In  [3],  the  authors  describe  a  pedestrian  detection  system  which  uses  stereo 
vision  to  produce  a  3D  point  cloud,  and  then  classifies  the  cloud  according 
to  the  point  shape  distribution  considering  the  first  two  central  moments  of 
the  2D  projections  using  a  naive  Bayes  classifier.  Motion  is  also  used  as  a  cue 
for  human  detection. 

In  [8]  the  authors  report  an  algorithm  capable  of  detecting  both  stationary 
and  moving  humans.  Their  approach  uses  multi-sensor  modalities  including 
3D  LADAR  and  long  wave  infrared  video  (LWIR).  Similarly,  in  [9]  the  same 
research  group  presents  a  technique  for  detecting  humans  that  combines  the 
use  of  3D  LADAR  and  visible  spectrum  imagery.  In  both  efforts  the  authors 
employ  a  2D  template  to  extract  features  from  the  shape  of  an  object.  Among 
other  differences,  as  opposed  to  our  work,  they  extract  a  shape  template  from 
the  projection  in  only  one  plane,  and  compute  a  measure  of  how  uniformly 
distributed  the  returns  are  across  the  template. 
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3  ALGORITHM  DESCRIPTION 

In  this  section,  the  algorithm  for  pedestrian  detection  and  classification  is 
described.  In  our  approach,  since  operation  in  real  time  is  a  chief  concern, 
we  do  object  detection  and  tracking  in  a  2D  data  subset  first,  and  then 
use  the  object’s  position  and  size  information  to  partition  the  set  of  3D 
measurements  into  smaller  groups,  for  further  analysis.  We  describe  these 
steps  in  the  following  sections. 


3.1  Projection  into  2D  Plane 


To  reduce  the  computational  cost  of  processing  the  entire  point  cloud,  we 
initially  isolate  a  2D  virtual  slice,  which  contains  only  points  located  at  a 
certain  height  above  ground.  As  shown  in  Fig.  1,  a  3D  scanner  produces  a 
point  cloud,  from  which  a  “slice”  is  projected  onto  the  2D  plane,  resulting  in 
a  virtual  scan  line.  This  scan  line  is  a  vector  of  range  measurements  coming 
from  consecutive  bearings,  which  resembles  the  kind  of  data  reported  directly 
by  a  line  scanner  such  as  the  SICK™  laser  scanner.  This  is  done  by  collapsing 
into  the  plane  all  the  points  residing  within  the  slice,  which  is  defined  by  its 
height  above  the  ground. 


•V '  I  f  1 

V  '  »*  ' 

Points  included  in  slice 

Points  NOT  included  (discarded] 

(a) 

(b) 

The  ground  elevation  is  stored 
in  a  scrolling  grid  that  contains  ac¬ 
cumulated  LADAR  points  and  is 
centered  at  the  vehicle’s  current 
position.  The  points  are  weighted 
by  age,  more  recent  points  have  a 
heigher  weight.  The  mean  and  stan¬ 
dard  deviation  of  the  heights  of  all 
scan  points  that  are  inside  each  cell 
are  computed,  and  the  elevation  is 
then  calculated  by  subtracting  one 
standard  deviation  from  the  aver¬ 
age  height  of  all  the  points  in  the 
cell.  The  key  properties  of  this  sim¬ 
ple  algorithm  are  that  mean  and 
standard  deviations  can  be  calcu¬ 
lated  recursively,  and  that  the  el¬ 
evation  is  never  below  the  lowest 
point  while  still  having  about  80% 
of  the  points  above  ground. 

The  system  adapts  to  different  environments  by  varying  the  shape  of  the 
sensing  plane  i.e.,  by  adjusting  the  height  of  the  slice  from  which  points  are 
projected  onto  a  two-dimensional  plane.  Spurious  measurements  produced 


Fig.  1  Projection  of  virtual  scan  line,  (a)  A 
point  cloud  is  collected  from  the  environment 
shown,  (b)  The  points  located  within  a  certain 
height  above  ground  are  projected  into  a  2D 
plane,  and  processed  as  if  it  were  a  single  scan 
line.  The  resulting  projection  is  shown  in  (c), 
top  view 


4 


Luis  E.  Navarro- Serment,  Christoph  Mertz,  and  Martial  Hebert 


by  ground  returns  are  avoided  by  searching  for  measurements  at  a  constant 
height  above  the  ground.  Since  our  research  was  done  in  an  open  outdoor 
environment,  we  did  not  encounter  overhanging  structures  like  overpaths  or 
ceilings.  These  might  be  topics  of  future  research. 


3.2  Motion  Features 

After  detecting  and  tracking  objects  using  the  virtual  scan  line  we  can  com¬ 
pute  a  Motion  Score  (MS).  The  MS  is  a  measure  of  how  confident  the  algo¬ 
rithm  is  that  the  detected  object  is  a  human,  based  on  four  motion-related 
variables:  the  object’s  size,  the  distance  it  has  traveled,  and  the  variations  in 
the  object’s  size  and  velocity.  The  size  test  discriminates  against  large  objects 
like  cars  and  walls.  The  distance  traveled  test  discriminates  against  station¬ 
ary  objects  like  barrels  and  posts.  The  variation  tests  discriminate  against 
vegetation,  since  their  appearance  changes  a  lot  due  to  their  porous  and  flex¬ 
ible  nature.  The  individual  results  of  these  tests  are  scored,  and  then  used 
to  calculate  the  MS.  A  detailed  description  of  each  test  and  all  parameters 
involved  is  presented  in  [6]. 


3.3  Geometric  Features 

To  discriminate  against  static  structures,  we  compute  a  group  of  distinguish¬ 
ing  geometric  features  from  the  set  of  points  belonging  to  each  object  being 
tracked  in  2D,  and  then  feed  these  features  to  a  classifier,  which  determines 
whether  the  object  is  a  human  or  not.  This  concept  is  depicted  in  Fig.  2. 

As  shown  in  Fig.  2(a),  the  process  starts  when  a  point  cloud  is  read  from 
the  sensor.  We  define  Zj  =  {xi,  x2, . . . ,  x^}  as  the  set  of  N  points  contained 
in  a  frame  collected  at  time  ty ,  whose  elements  are  represented  by  Cartesian 
coordinates  x  =  ( x,y,z ).  The  points  corresponding  to  one  frame  are  shown, 
and  are  colored  according  to  their  height  above  ground.  To  avoid  the  com¬ 
putational  cost  of  processing  the  entire  point  cloud,  we  extract  a  2D  virtual 
slice,  as  described  in  Section  3.1  (Fig.  2(b)).  For  each  object  being  tracked, 
its  position,  velocity,  and  size  are  estimated  using  the  algorithm  described 
in  [6].  These  values  are  used  to  compute  the  MS.  The  object’s  position  and 
size  information  are  used  to  isolate,  from  the  original  point  cloud,  only  those 
points  corresponding  to  potential  humans,  as  shown  in  Fig.  2(c).  In  this  way, 
the  three-dimensional  information  corresponding  to  each  object  is  recovered 
in  the  form  of  smaller  sets  of  points.  At  this  point,  we  have  a  collection  of  M 
sets  {Si,  52, . . . ,  Sm},  where  Sie{i,2,...,M}  C  Zj.  A  feature  vector  is  computed 
from  each  of  these  sets  (Figs.  2(d)  -  (e)),  and  then  fed  to  a  classifier  that  de¬ 
termines  for  each  object  whether  it  is  a  human  or  not,  Fig.  2(f).  This  decision 
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is  made  for  each  object,  and  is  based  on  the  most  recent  set  of  points  collected 
from  the  sensor.  The  classifier  also  takes  into  account  the  information  used 
to  calculate  the  MS;  this  is  described  in  a  subsequent  section. 

A  set  of  features  is  computed  with  the  purpose  of  extracting  the  most 
informative  signatures  of  a  human  in  an  upright  posture  from  the  3D  data. 
The  legs  are  particularly  distinctive  of  the  human  figure,  so  the  algorithm 
computes  statistical  descriptions  from  points  located  around  the  legs.  Similar 
descriptions  are  computed  from  the  trunk  area,  representing  the  upper  body. 
Additionally,  the  moment  of  inertia  tensor  is  used  to  capture  the  overall 
distribution  of  all  points.  Finally,  to  include  the  general  shape  of  the  human 
figure,  we  compute  the  normalized  2D  histograms  on  two  planes  aligned  with 
the  gravity  vector. 


3.3.1  Feature  Extraction 

Let  Sk  =  {xi,  X2, . . . ,  xn}  be  the  set  of  points  belonging  to  the  object  k , 
whose  elements  are  represented  by  Cartesian  coordinates  x  =  (x,y,z).  A  set 
of  suitable  features  is  computed  from  Sfc,  as  depicted  in  Fig.  2(d),  which 
constitutes  a  profile  of  the  object. 

We  begin  by  performing  Principal  Component  Analysis  (PC A)  using  all 
the  elements  of  £&,  to  identify  the  statistical  patterns  in  the  three-dimensional 
data  (see  Fig.  3).  This  involves  the  subtraction  of  the  mean  m  from  each  of 
the  three  data  dimensions.  From  this  new  data  set  with  zero  mean,  we  calcu¬ 
late  the  covariance  matrix  E  E  5ft3  x  3 ,  and  the  normalized  moment  of  inertia 
tensor  M  E  5ft3 x 3 ,  treating  all  points  as  unit  point  masses: 


3D  points 


(a) 


Fig.  2  Improved  pedestrian  detection.  Geometric  features  present  in  subsets  of  the  point 
cloud  are  used  by  a  classifier  to  distinguish  pedestrians  from  static  objects. 
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£  = 


1 


n  —  1 


n 

£(xfe  -  m)(xfc  -  m)T 
k= 1 


M  = 


Efe=i(y|  +  2I)  -Y2=ixkVk  -Y2=1xkzk 

-E3U^*  ELi(4  +  4)  -ELiw^ 
L-ELi^  -ELi^  ELi(4  +  ^) 


Since  both  £  and  M  are  symmetric,  we  only  use  6  elements  from  each  as 
features. 

Resulting  from  the  PC  A  are  three  pairs  of  eigenvectors  and  eigenvalues, 
sorted  according  to  decreasing  eigenvalue.  Call  these  eigenvectors  ei,e2,e3, 
with  their  corresponding  eigenvalues  Ai  >  A2  >  A3.  We  assume  that  a  pedes¬ 
trian  is  in  an  upright  position,  so  the  principal  component  ei  is  expected 
to  be  vertically  aligned  with  the  person’s  body1.  Together  with  the  second 
largest  component  e2,  it  forms  the  main  plane  (Fig. 3,  center  top),  and  also 
forms  the  secondary  plane  with  the  smallest  component,  e3  (Fig. 3,  center 
bottom) .  We  then  transform  the  original  data  into  two  representations  using 
each  pair  of  components  ei,e2  and  ei,e3,  from  which  we  proceed  to  com¬ 
pute  additional  features  (the  third  possible  representation,  i.e.  using  the  two 
smallest  components  e2,e3,  is  not  used). 

We  focus  on  the  points  included  in  the  main  plane,  to  analyze  the  patterns 
that  would  correspond  to  the  legs  and  trunk  of  a  pedestrian,  as  shown  in 
Fig  3,  center  top.  These  zones  are  the  upper  half,  and  the  left  and  right 
lower  halves.  After  separating  the  points  into  these  zones,  we  calculate  the 
covariance  matrix  (in  2D)  over  the  transformed  points  laying  inside  each  zone. 
This  results  in  9  additional  features  (3  unique  values  from  each  zone). 

Finally,  we  compute  the  normalized  2D  histograms  for  each  of  the  two 
principal  planes  (Fig.  3,  right),  to  capture  the  shape  of  the  object.  We  use 
14  x7  bins  for  the  main  plane,  and  9x5  for  the  secondary  plane.  Each  bin  is 
used  as  a  feature,  so  there  are  143  features  representing  the  shape.  A  total  of 
164  geometric  features  are  determined  for  each  object. 


3.4  Human  Detection 

A  classifier  (Fig.  2(f)),  composed  of  two  independent  Support  Vector  Ma¬ 
chines  (SVM)  [2],  determines  for  each  object  whether  it  is  human  or  not.  The 
first  classifier  is  a  SVM  that  receives  the  vector  of  164  geometric  features 
computed  directly  from  S*.,  and  scores  how  closely  the  set  matches  a  human 


1  Dealing  with  the  violation  of  this  assumption  is  the  focus  of  current  research 
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Main  plane  (largest  eigenvectors) 


-0.5  -0.5 

Fig.  3  A  set  of  geometric  features  is  computed  from  a  the  set  of  points  belonging  to  an 
object. 


shape.  We  call  this  the  Geometric  Score  (GS).  The  GS  is  particularly  effec¬ 
tive  for  detecting  static  pedestrians.  Similarly,  the  features  used  to  determine 
the  MS  (i.e.  object’s  size,  the  distance  it  has  traveled,  and  the  corresponding 
size  and  motion  noises)  contain  valuable  information  about  the  motion  of 
the  target.  Together  with  the  GS,  these  features  are  fed  to  a  second  SVM, 
whose  output  represents  the  distance  to  the  decision  surface  of  the  SVM.  The 
Strength  of  Detection  (SOD),  the  total  measure  of  how  strongly  the  algorithm 
rates  the  object  as  being  a  human,  is  calculated  as  the  logistic  function  of 
the  distance  to  the  decision  surface.  This  number  is  reported  for  each  object. 
If  the  GS  cannot  be  computed  (e.g.  insufficient  data  from  a  distant  target, 
or  violation  of  the  upright  position  assumption),  then  the  MS  is  reported  as 
the  SOD  for  that  object. 


3.4.1  Training 

We  trained  the  GS  classifier  using  a  combination  of  simulated  and  real  ex¬ 
amples.  Because  it  is  impossible  to  collect  enough  real  data  to  evaluate  per¬ 
ception  algorithms  in  all  possible  situations,  we  have  created  a  simulator 
capable  of  producing  synthetic  examples  of  sensor  data.  The  simulator  uses 
a  ray  tracing  engine  to  generate  a  set  of  ray  intersections  between  sensor  and 
the  objects  in  the  scene  to  simulate.  This  information  is  then  used  to  pro¬ 
duce  synthetic  LADAR  measurements  according  to  a  set  of  parameters  for  a 
particular  sensor,  as  shown  in  Fig.  4.  We  trained  the  GS  classifier  using  over 
3500  examples  (27.4%  humans,  72.6%  non- humans).  The  human  set  included 


8 


Luis  E.  Navarro- Serment,  Christoph  Mertz,  and  Martial  Hebert 


62%  of  simulated  examples.  The  second  classifier  was  trained  using  only  real 
examples,  since  the  motion  and  size  noises  used  to  determine  the  MS  are  of  a 
dynamic  nature  and  consequently  harder  to  simulate  efficiently  (over  46000 
examples:  6%  humans,  94%  non- humans). 


Fig.  4  Simulated  target  (left),  and  its  corre¬ 
sponding  point  cloud  (right). 


We  trained  both  SVMs  using 
a  five-fold  cross  validation  proce¬ 
dure.  We  found  that  both  radial 
basis  function  (RBF)  and  poly¬ 
nomial  kernels  resulted  in  simi¬ 
lar  levels  of  classification  perfor¬ 
mance.  After  multiple  tests,  we 
determined  that  a  RBF  kernel 
was  the  best  for  the  calculation 
of  the  GS,  while  a  polynomial 
kernel  was  preferred  for  the  sec¬ 
ond  classifier. 


4  EXPERIMENTAL  RESULTS 


This  section  presents  the  results  of  several  experimental  runs.  These  results 
were  obtained  from  field  tests  performed  on  a  Demo  III  XUV  [7].  The  data 
comes  from  14  different  runs,  where  the  variations  include  static  and  moving 
vehicles,  pavement  and  off-road  driving,  and  pedestrians  standing,  walking, 
or  jogging.  The  data  was  taken  at  17  Hz,  and  the  average  duration  of  each  run 
was  about  1  minute.  There  were  altogether  48  humans  and  1075  non-human 
objects,  where  those  who  came  in  and  out  of  the  field-of-view  were  counted 
twice.  The  ground  truth  was  produced  by  labelling  the  data  by  hand. 

In  the  upper  part  of  Fig.  5  the  ROC  curve  and  the  precision-recall  curves 
are  shown.  Each  human  in  one  cycle  is  a  positive  example  and  each  non¬ 
human  object  in  one  cycle  is  a  negative  example.  There  are  about  6300  pos¬ 
itive  and  60000  negative  examples.  These  plots  illustrate  the  current  perfor¬ 
mance  of  our  system.  The  blue  traces  indicate  the  MS  score,  which  is  our 
previous  detection  algorithm.  The  red  traces  indicate  the  geometric  score, 
i.e.  the  classification  using  the  geometric  features  computed  directly  from 
the  object’s  set  of  3D  points,  but  without  any  motion  clues.  As  seen  in  the 
plots,  neither  algorithm  by  itself  outperforms  the  other  throughout  the  entire 
operational  range.  For  low  false  positive  rates  the  GS  is  better  and  at  high 
false  positive  rates  MS  is  better.  As  we  mentioned  earlier,  the  MS  only  works 
for  static  humans  at  high  false  positive  rates.  The  synergistic  combination  of 
both  results  has  significantly  better  performance,  as  indicated  by  the  black 
traces. 

An  alternative  representation  of  ROC  and  precision-recall  is  shown  in  the 
lower  part  of  Fig.  5,  where  each  object  is  counted  per  run.  The  score  of  an 


Pedestrian  Detection  and  Tracking  Using  Three-Dimensional  LADAR  Data 


9 


false  positive  rate  [FP/neg] 


Fig.  5  The  plots  on  the  left  show  the  precision-recall  and  the  ones  on  the  right  the  ROC 
curves.  Shown  are  the  curves  produced  by  using  the  geometric  score  (red  dashed),  the 
motion  score  (blue  dot-dash)  and  the  combined  (black  solid).  For  the  upper  curves  each 
object  in  each  cycle  and  in  the  lower  curves  each  object  in  the  full  run  is  counted  as  one 
example. 


object  is  the  mean  of  the  score  over  all  cycles,  with  a  minimum  of  10  cycles. 
As  mentioned  above,  there  are  48  humans  (=  positive  examples)  and  1075 
other  objects  (negative  examples).  A  noteworthy  operating  point  is  where 
there  are  basically  no  false  positives  (rate  is  10-3)  and  still  the  true  positive 
rate  is  0.75. 

We  identified  several  cases  where  performance  is  decreased.  Typically,  par¬ 
tial  views  of  a  human  (e.g.  only  the  upper  body  and  part  of  the  legs  are  seen 
by  the  sensor)  result  in  false  negative  detections.  Also,  false  detections  occur 
when  a  target  is  only  partially  inside  the  sensor’s  field  of  view.  Similarly, 
pedestrians  in  non-upright  positions  usually  result  in  false  negative  detec¬ 
tions.  This  is  expected,  because  of  the  particular  attention  paid  to  legs  and 
torso  when  extracting  geometric  features.  An  exception  to  this  is  the  case 
of  kneeling  humans,  which  we  have  been  able  to  detect  consistently,  though 
they  are  usually  borderline  classified  as  humans.  Solving  these  problems  is 
the  focus  of  current  research. 
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5  CONCLUSION 

We  described  a  pedestrian  detection  and  tracking  system  using  only  three- 
dimensional  data.  The  approach  uses  geometric  and  motion  features  to  rec¬ 
ognize  human  signatures,  and  clearly  improves  the  detection  performance 
achieved  in  our  previous  work.  The  set  of  features  used  to  determine  the  hu¬ 
man  and  motion  scores  was  designed  to  detect  humans  in  upright  positions. 
To  increase  the  robustness  of  detection  of  humans  in  other  postures,  in  fu¬ 
ture  research  we  will  investigate  ways  of  extracting  signatures  from  the  point 
cloud  that  are  highly  invariant  to  deformations  of  the  human  body. 
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Passive,  long-range  detection  of  Aircraft: 
Towards  a  field  deployable  Sense  and  Avoid 
System 

Debadeepta  Dey,  Christopher  Geyer,  Sanjiv  Singh,  Matt  Digioia 


1  Introduction 

Unmanned  Aerial  Vehicles  (UAVs)  typically  fly  blind  with  operators  in  distant  loca¬ 
tions.  Most  UAVs  are  too  small  to  carry  a  traffic  collision  avoidance  system  (TCAS) 
payload  or  transponder.  Collision  avoidance  is  currently  done  by  flight  planning,  use 
of  ground  or  air  based  human  observers  and  segregated  air  spaces.  US  lawmakers 
propose  commercial  unmanned  aerial  systems  access  to  national  airspace  (NAS)  by 
30th  September  2013.  UAVs  must  not  degrade  the  existing  safety  of  the  NAS,  but 
the  metrics  that  determine  this  have  to  be  fully  determined  yet.  It  is  still  possible  to 
state  functional  requirements  and  determine  some  performance  minimums.  For  both 
manned  and  unmanned  aircraft  to  fly  safely  in  the  same  airspace  UAVs  will  need  to 
detect  other  aircraft  and  follow  the  same  rules  as  human  pilots. 

Key  specifications  of  the  international  committee  F38  on  UAS  systems  standard 
F241 1-04  (1)  proposed  requirements  which  include  a  field  of  regard  of  220°  (horizontal) 
x  30° (vertical),  minimum  detection  range  of  3  statute  miles  under  visual  flight  rules 
and  a  required  miss  distance  of  500  ft.  Without  this  capability,  widespread  utilization 
of  UAVs  will  not  be  possible. 

This  paper  focuses  on  the  sensing  of  aircraft  with  passive  vision.  Small  size,  low 
weight  and  power  requirement  make  cameras  attractive  for  this  application.  Multiple 
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Fig.  1  Selection  of  1 1  x  11 
subwindows  showing  the 
image  of  the  Piper  Archer  II 
which  was  used  as  the  intruder 
aircraft  for  collecting  imagery, 
at  a  range  of  1.5  miles.  The 
camera  and  lens  used  had  0.41 
milliradian/pixel  resolution 
and  a  field  of  view  of  30°  (H) 
x  21°(V). 

cameras  can  be  used  to  cover  the  wide  field-of-regard.  A  typical  image  of  an  aircraft 
at  a  range  of  3  miles  is  a  few  pixels  in  diameter.  Fig.  1  shows  a  1  lvl  1  window  around 
the  image  of  the  intruder  aircraft  at  various  ranges.  Part  of  the  challenge  in  detecting 
aircraft  in  such  a  wide  field  of  regard  reliably  is  the  low  signal  to  background  ratio. 
Active  sensors  like  radar  are  not  feasible  because  of  their  prohibitive  power  and  size 
requirements  (2)  for  UAVs.  Passive  vision  provides  a  low  cost,  low  power  solution 
albeit  at  the  cost  of  a  relatively  high  false  positive  rate.  Using  an  approach  based 
on  morphological  filters  augmented  with  a  trained  classifier  we  have  been  able  to 
obtain  98%  detection  rate  out  to  5  statute  miles  and  a  false  positive  rate  of  1  every 
50  frames. 

In  section  2  the  related  work  in  sense  and  avoid  systems  is  outlined.  In  section 

3  we  discuss  the  details  of  the  vision  based  aircraft  detection  algorithm.  In  section 

4  we  outline  our  efforts  to  collect  imagery  of  flying  aircraft,  and  details  about  the 
result  of  our  algorithm  on  the  corpus  of  real  ground  truthed  imagery  of  aircraft. 
Finally  in  section  5  we  discuss  the  path  forward  towards  a  field  deployable  sense 
and  avoid  system. 
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2  Related  Work 

Utt  et  al.  (20)  describe  a  fielded  vision-based  sensory  and  perception  system.  Mc- 
Candless  (14)  proposes  an  optical  flow  method  for  detecting  moving  aircraft.  The 
use  of  morphological  filtering  is  popular  in  computer  vision  based  sense  and  avoid 
systems  (11;  4).  This  approach  generates  a  significant  number  of  false  positives. 
Petridis  et  al.  use  AdaBoost  (17)  to  detect  aircraft  in  low  resolution  imagery.  Track- 
Before-Detect  (TBD)  (9)  is  an  approach  used  mainly  on  infrared  imagery.  Defence 
Research  Associates  have  implemented  a  vision  based  sense  and  avoid  system  on  a 
Predator  UAV  system  (13). 

Efforts  to  directly  model  the  range  of  atmospheric  conditions  under  VFR  remain 
untouched.  A  field  deployable  sense  and  avoid  system  must  be  able  to  operate  in  a 
variety  of  atmospheric  conditions  including  fog,  haze  and  directly  against  the  glare 
of  the  sun.  The  operation  of  the  system  must  not  degrade  beyond  an  acceptable  level 
under  all  these  conditions.  We  have  developed  an  image  formation  model  which  ac- 
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Fig.  2  The  minimum  de¬ 
tection  distance  required  to 
guarantee  collision  avoidance 
for  varying  intruder  speeds 
and  ownship  speeds  of  40,  50 
and  60  knots.  For  the  best  case 
scenario  the  minimum  detec¬ 
tion  distance  is  700  meters 
and  for  the  worst  case  sce¬ 
nario  the  minimum  detection 
distance  is  2100  m 

counts  for  the  various  atmospheric  conditions.  We  used  this  model  to  predict  the 
signal  to  background  ratio  of  the  image  of  the  aircraft.  The  image  formation  model 
is  described  in  detail  in  (12).  The  model  also  allows  the  determination  of  the  suit¬ 
ability  of  any  sensor  combination  before  using  the  sensor  and  also  to  determine 
the  minimum  derived  resolution  for  achieving  a  specified  performance.  The  perfor¬ 
mance  of  the  image  formation  model  has  been  validated  by  the  vast  corpus  of  real 
imagery  of  flying  aircraft  that  we  collected  during  the  course  of  this  project. 


2.1  Requirements 


The  range  requirements  of  an  aircraft  detection  system  are  influenced  by  two  main 
factors:  regulations  developed  by  the  FAA  and  the  maneuvering  capabilities  of  the 
UAV. 

Duke  et  al.(6)  and  Schaeffer  et  al.(18)  outline  core  competencies  needed  by  a  hu¬ 
man  equivalent  system.  The  human  equivalence  mandated  by  OSD  (15)  and  ACC 
(7)  require  vehicles  to  avoid  non-cooperative  vehicles  without  such  systems.  Shak- 
ernia  et  al.  (19)  leverage  work  of  Utt  et  al.  (20)  on  using  maneuvers  to  reduce  the 
intrinsic  uncertainty  about  range  when  using  an  image  based  detector. 

In  order  to  decide  the  range  requirements  of  the  system  we  opted  for  a  colli¬ 
sion  avoidance  system  by  Frazzoli  et  al.  (10)  that  enables  an  UAV  to  aggressively 
maneuver  without  breaching  its  envelope.  Further  details  are  in  (12). 


3  Approach 


We  experimented  with  a  number  of  different  approaches  to  detecting  small  targets 
with  low  signal  to  background  ratios  with  an  emphasis  on  methods  that  have  both 
high  detection  rates  and  low  computational  complexity. 

We  have  developed  a  multi-stage  method  that  starts  with  a  large  number  of  can¬ 
didates  and  winnows  these  down.  We  start  with  a  morphological  filter  that  looks 
for  high  contrast  regions  in  the  image  that  are  most  likely  to  be  aircraft.  Next  we 
use  a  classifier  that  has  been  trained  on  positive  and  negative  examples  and  finally 
we  track  the  candidates  over  time  to  remove  false  positives.  Below  we  discuss  each 
’’stage”  of  detection  in  detail. 
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3.1  Stage  1:  Morphological  Filtering 


In  the  first  stage,  we  apply  a  morphological  filter  that  detects  deviations  from  the 
background  intensity.  We  use  two  types,  one  favors  dark  targets  against  lighter  back¬ 
grounds  (positive),  and  the  other  favors  light  targets  against  darker  backgrounds 
(negative).  The  positive  morphological  filter  takes  the  form: 


JZ+(x,y)  =  y(x,y)  —  max{  max  min  )(*  +  /  +  j,y)y 

|*f<w  |jf|<w 

max  min  J?  (x,  y  +  i  +  / ) } 

\i\<w\j\<w 


(1) 


As  long  as  no  2w  +  1  sub-window  (we  used  w  =  2)  contains  all  target  pixels 
(higher  intensity)  and  no  background  pixels  (lower  intensity),  then  all  sub- windows 
will  contain  at  least  one  (darker)  background  pixel.  Since  the  background  could  be 
noisy,  the  max’s  have  the  effect  of  finding  a  conservative  greatest  lower-bound  for 
the  background  intensity.  The  difference,  then,  yields  an  estimate  of  the  difference 
between  signal  and  background  for  the  pixel.  The  negative  morphological  filter, 
,  swaps  min’s  for  max’s,  and  negates  the  expression.  From  we  choose 
the  top  n+  pixels  above  a  threshold  T+,  while  suppressing  local  non-maxima,  and 
construct  a  list  of  detections.  We  do  the  same  for  Fig. 3  shows  an  example 
aircraft  image  and  the  result  of  the  morphological  filtering  on  the  example  image. 


Fig.  3:  The  image  on  the  left  shows  part  of  the  image  of  the  Piper  Archer  II  at  a  range  of  2.87  miles. 
The  image  on  the  right  shows  the  result  of  the  morphological  operation  of  the  left  image  in  Stage 
1  of  the  processing  pipeline.  The  dark  aircraft  image  shows  up  as  bright  white  spot. 


3.2  Stage  2:  Construction  of  a  Shape  Descriptor 


For  each  detection  we  fit  a  Gaussian  function  to  its  (2r  +  1)  x  (2r  +  1)  sub- window 
(we  settled  on  r  =  7)  and  construct  a  shape  descriptor  for  the  detection.  Through  trial 
and  error  we  found  a  descriptor  that  was  a  good  discriminator  in  test  sequences. 
The  descriptor  encodes  the  parameters  of  the  fitted  Gaussian,  as  well  as  statistics 
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Fig.  4:  The  image  on  the  left  shows  an  image  of  the  aircraft  at  2.87  miles.  The  figure  on  the  right 
shows  the  two  dimensional  Gaussian  function  fitted  on  the  15x15  subwindow  centered  on  the 
image  of  the  aircraft  in  the  right  image  in  Stage  2  of  the  processing  pipeline. 

computed  from  the  residual  image.  We  use  an  axis-aligned  Gaussian,  parameterized 
as  follows: 


(2) 


We  center  the  Gaussian  at  the  the  pixel  with  the  largest  absolute  deviation  from  the 
window’s  mean  intensity.  We  use  gradient  descent  to  minimize  the  sum  of  square 
errors  between  the  input  sub- window  and  Sf  ( ■ • ;  £ ) ,  minimizing  over  £,  =  {Gx,Gy,b,s). 
To  do  this  efficiently,  we  avoid  repeated  calls  to  the  exponential  function  by  pre- 
computing  both  a  set  of  template  ’s  over  a  range  of  ox  and  oy  pairs,  with  (b,s)  = 
(0, 1),  and  a  set  of  finite  difference  approximations  to  the  partial  derivatives  of  Sf 
with  respect  to  ox  and  oy.  Fig.4  shows  an  example  aircraft  image  and  the  fitted  two 
dimensional  Gaussian  window  centered  on  the  image  of  the  aircraft. 

Using  the  best  fitting  Gaussian  Sf  *,  we  compute  a  shape  descriptor  from  the  resid¬ 
ual  difference  between  the  input  image  and  Sf  *  in  upper-left  (UL),  upper-right  (UR), 
lower-left  (LL),  lower-right  (LL),  and  center  (C)  regions.  We  construct  both  positive 
and  negative  half- sign  sums.  For  example: 


=  T,i<x<w  max[0 ,&*(x,y)  -  J[x,y)\ 

\<y<w 


sc  =  ^ w/2<x<3w/2  min[0,Sf*(x,;y)  -  J?(x,y)] 


w/2<x<3w/2 


Then,  we  construct  min’s  and  max’s  of  positive  and  negative  half-sign  sums,  e.g., 
^max  =  max  (Sul,  . . .  ,Sj) ,  and  for  each  statistic  we  take  its  log  normalized  by  the 
background  intensity  b,  e.g.,  s+ax  =  log  (S+ax/Z?).  We  also  compute  the  estimated 
signal  to  background  ratio: 


\b\  +  \s\/27taxGy 
\b\ 


(3) 
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Finally,  the  shape  descriptor  we  use  is: 


d=  (b,s,ox,oy,SBRy 


'min  ’  max  5 13 min  ’  max  5 


0+  0+ 


^UL’  °UR’  °LL’  °LR’  ’  °UL’  °UR’  °LL’  °LR’  ) 


We  associate  this  19-dimensional  vector  with  each  detection. 


3.3  Stage  3:  SVM-based  Classification  of  Potential  Targets 


The  next  stage  of  the  algorithm  is  to  pass  the  computed  descriptor  f  for  each  detec¬ 
tion  through  a  support  vector  machine  (SVM)  (5).  We  trained  the  SVM  using  shape 
descriptors  computed  for  positive  and  negative  examples  taken  from  a  sequence  of 
hand-labeled  images.  For  negative  examples  we  used  the  false  negatives  produced 
by  the  morphological  filter.  We  used  OpenCV’s  (3)  implementation  of  an  SVM  (3), 
and  chose  to  use  radial  basis  functions  for  the  classifier. 

OpenCV’s  SVM  implementation  returns  a  hard  classification:  positive  or  nega¬ 
tive  depending  on  the  sign  of  a  summation,  e.g.  y  =  sign  x,  where  x  =  TLifi{ d)-  We 
want  to  associate  a  probability  with  each  detection,  so  as  to  make  them  compara¬ 
ble,  and  so  we  modified  the  implementation  to  use  the  value  of  x  to  estimate  the 
probability  that  the  detection  is  from  a  true  target.  During  training  we  construct  em¬ 
pirical  densities  of  x  for  positive  (/?+)  and  negative  (p~)  classes  using  a  mixture  of 
Gaussians,  and  store  a  log-likelihood  ratio  function  l{x)  =  log p+  / p~  in  a  look-up 
table  keyed  on  x.  We  choose  the  kernel  bandwidth  just  large  enough  to  make  the 
odds  monotonic  in  x.  We  keep  only  those  detections  whose  odds  exceed  a  minimum 
value  of  pm in. 


3.4  Stage  4:  Tracking 

The  purpose  of  this  stage  is  to  track  detections  over  time,  associating  detections  to  a 
list  of  tracked  targets.  Since  many  of  the  false  positives  are  intermittent,  we  also  use 
tracking  to  reduce  the  false  positive  rate.  We  arrived  at  a  simple  procedure  for  target 
tracking  that  provides  a  full  screen  tracking  system  for  high  definition  imagery. 

First,  we  always  maintain  a  list  of  targets,  and  in  steady  state,  it  is  the  job  of 
the  tracker  to  associate  to  every  existing  target  a  detection.  With  any  remaining 
detections,  it  also  decides  whether  to  create  new  targets. 

For  each  existing  target  we  consider  a  set  of  candidate  detections,  which  are 
chosen  from  a  wide  search  area  around  the  predicted  position  of  the  target.  For  each 
potential  matching  detection,  we  evaluate  the  likelihood  that  the  target  and  detection 
are  associated  given  their  respective  descriptors. 

Then,  given  a  list  of  the  likelihoods  for  the  possible  pairings  we  construct  a  graph 
with  a  node  for  each  target  and  each  detection,  and  edges  between  possible  pairings, 
where  the  weights  are  the  log  likelihoods  of  pairings. 

We  construct  a  cost  matrix,  whose  rows  correspond  to  targets,  columns  to  detec¬ 
tions,  and  entries  are  the  log  likelihoods  of  the  potential  pairing,  with  —  oo  given  to 
non-candidate  pairings.  The  goal  is  to  choose  entries  from  the  matrix,  no  more  than 
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Fig.  5  A  manned  aircraft 
equipped  with  a  GPS  was 
flown  in  a  series  of  flights 
such  that  it  was  in  the  field 
of  view  of  the  ground  based 
cameras.  The  circles  show 
the  distance  to  the  cameras  in 
miles. 


10  miles 


one  from  every  row  and  no  more  than  one  from  every  column  such  that  the  sum 
of  probabilities  is  a  maximum.  We  use  the  Hungarian  algorithm  (16)  to  find  this 
matching.  For  the  number  of  targets  we  typically  have,  usually  less  than  200,  this 
computation  can  be  computed  in  less  than  8  milliseconds. 


4  Data  Collection  and  Results 

Collaborating  with  the  Penn  State  Electro-Optics  Center  (EOC),  we  collected  ground 
to  air  imagery  of  aircraft  with  ten  different  camera/lens  combinations.  Four  different 
infra-red  cameras  were  also  used  to  acquire  imagery.  The  EOCs  Payload  Develop¬ 
ment  Center,  a  comprehensive  aerial  systems  integration  lab  (located  on  the  field  at 
the  Jimmy  Stewart  Airport,  Indiana,  PA),  was  utilized  to  provide  hardware,  testing 
and  technical  support,  and  flight  operations. 

We  tracked  a  Piper  Archer  II  flying  in  the  air  from  the  ground  with  cameras 
mounted  on  pan  tilt  unit  and  synchronised  with  the  geolocation  of  the  aircraft  so 
that  it  always  remained  in  field  of  view  of  the  cameras. 

In  Fig. 5  we  show  the  pattern  the  intruder  aircraft  flew  as  we  gathered  imagery. 
Till  date  we  have  collected  2.5  terabytes  of  imagery  of  which  in  2  terabytes  the 
position  of  the  aircraft  has  been  picked  out  manually  for  ground  truth  purposes.  This 
corpus  of  real  imagery  has  been  used  to  analyze  the  performance  of  our  algorithm. 

We  evaluated  the  performance  of  each  stage  of  the  algorithm  using  receiver  oper¬ 
ator  characteristic  curves  (ROC)  curves,  which  measure  specificity  (  ability  to  reject 
outliers  )  and  sensitivity  (  ability  to  detect  true  target  )  of  a  detector  on  about  2 
terabytes  of  imagery  of  above  the  horizon  flying  aircraft. 

Stage  3  improves  the  false  positive  rate  by  a  factor  between  6  and  17  depending 
on  the  detection  rate  over  Stage  1.  Refer  Table.  1  .  We  get  a  vast  improvement  with 
tracking  in  Stage  4.  In  the  case  of  both  Stage  1  and  Stage  3  the  variable  affecting 
rates  is  a  threshold.  For  Stage  1,  the  threshold  is  the  value  returned  by  the  morpho¬ 
logical  filter  at  the  detection.  For  Stage  3,  the  threshold  is  the  probability  according 
to  the  SVM  classifier,  that  the  detection  is  a  target. 

Fig. 6  on  the  left  shows  the  ROC  curve  for  Stage  1,  Stage  3  and  Stage  4  of  the 
algorithm.  Whereas  before  the  value  affecting  performance  was  a  threshold  on  the 
output  of  a  filter  or  classifier,  in  this  case  the  threshold  is  the  number  of  frames  for 
which  a  target  has  been  tracked.  It  is  to  be  noted  that  the  best  overall  detection  rate 
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Table  1 :  Shows  the  number  of  false  positives  per  frame  for  Stage  1  and  Stage  3  as  a  function  of  the 
true  positive  percentage.  Stage  3  reduces  the  false  positive  rate  by  a  factor  between  6  and  17  times 
as  Stage  1 


TP  % 

Stage  1 
FP/frame 

Stage  3 
FP/frame 

FP  Reduction 
Factor 

95% 

120 

20 

5.9x 

90% 

66 

3.9 

17x 

80% 

14 

1.0 

14x 

70% 

8 

0.66 

12x 

60% 

6.2 

0.56 

1 1  X 

Fig.  6:  Figure  on  the  left  shows  the  ROC  curve  for  true  positive  and  false  positive  for  the  3  main 
stages  of  the  algorithm  on  4  mp  imagery.  The  curve  for  Stage  4  shows  almost  perfect  detection 
rate  with  a  false  positive  rate  of  as  low  as  0.02  per  frame.  Figure  on  the  right  shows  the  effect  of 
varying  the  minimum  number  of  frames  that  a  potential  target  has  to  be  tracked  for  before  being 
declared  as  a  target  in  Stage  4.  This  curve  is  very  flat  as  most  true  positives  have  long  tracks  and 
false  positives  have  short  tracks. 

of  Stage  4  is  higher  than  the  best  overall  detection  rate  of  Stage  3,  even  though  it 
is  based  on  the  output  of  Stage  3.  We  believe  that  this  is  a  temporal  effect,  in  that 
detections  that  are  intermittently  below  threshold,  are  picked  up  by  the  tracker.  The 
detection  rate  decreases  slightly  at  closer  ranges.  This  is  due  to  the  fact  that  the 
algorithms  were  not  optimized  for  close  ranges. 

Fig. 6  on  the  right  shows  the  effects  of  the  variance  of  the  minimum  number 
of  frames  that  a  potential  target  has  to  be  tracked  for  before  being  declared  as  a 
target.  The  points  on  the  curve  are  the  number  of  frames  that  a  target  has  to  have 
been  tracked  for,  for  it  to  be  declared  a  possible  target.  In  our  experiments  we  let 
this  threshold  go  up  to  30  frames,  at  which  point  the  false  positive  rate  was  0.014 
FP/frame  and  detection  rate  was  92%.  It  is  to  be  noted  that  this  curve  is  very  flat. 
Most  of  the  true  positives  have  long  tracks  and  almost  all  outliers  have  short  tracks. 

Overall  there  is  a  significant  decrease  in  the  number  of  false  positives  per  frame. 
We  add  to  Table.  1  the  results  of  Stage  4  and  present  them  in  Table.2.  The  entries 
for  detection  rates  below  95%  are  not  filled  in  because  we  chose  not  to  evaluate  the 
threshold  frames  beyond  30.  If  we  had,  the  detection  rate  would  have  eventually 
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fallen.  We  find  that  tracking  in  Stage  4  improves  the  false  positive  rate  by  a  factor 
of  over  500  times  over  Stage  3. 


Table  2:  Performance  of  various  stages  of  the  algorithm.  Stage  4  achieves  a  false  positive  reduction 
rate  of  571  times  over  Stage  1 


TP  % 

Stage  1 
FP/frame 

Stage  3 
FP/frame 

Stage  4  FP/frame 

FP  Reduction 
Factor 

97% 

- 

- 

7.3 

- 

95% 

120 

20 

0.035 

571  x 

90% 

66 

3.9 

- 

- 

80% 

14 

1.0 

- 

- 

70% 

8 

0.66 

- 

- 

60% 

6.2 

0.56 

- 

- 

We  found  a  reasonable  compromise  in  false  positive  and  true  positive  rate  when 
we  insisted  that  targets  be  tracked  for  at  least  10  frames.  Then  the  overall  detection 
rate  was  95%,  the  false  positive  rate  was  0.05  false  positives  per  frame.  The  detection 
rate  is  nearly  100%  between  2.5  and  3.75  miles. 

About  80%  of  the  false  positives  that  made  it  through  the  tracking  of  at  least 
10  frames  were  items  that  are  of  interest  to  collision  avoidance.  Most  of  the  false 
positives  were  birds  or  landmarks  on  the  ground  that  were  not  segmented  out  by  the 
horizon  detector  (e.g.  an  antenna  in  the  distance).  These  targets  are  of  interest  and 
could  be  considered  useful. 

We  have  developed  and  demonstrated  a  vision  based  algorithm  that  achieves  a 
reasonable  false  positive  rate  of  approximately  98%  out  to  5  statute  miles  and  a 
false  positive  rate  of  1  in  every  50  frames  which  exceeds  the  FAA  (8)  regulatory 
requirement  of  reliable  detection  out  to  3  statute  miles. 


5  Future  Work 

Currently,  our  system  detects  bearing  to  targets  that  must  be  avoided.  An  important 
extension  will  be  to  estimate  range  to  the  target  so  that  precise  maneuvers  can  be 
planned.  We  are  currently  investigating  active  ranging  systems  that  can  be  pointed 
at  potential  targets,  to  estimate  range  and  further  reduce  false  positives. 

Fusing  infrared  imagery  with  visible  spectrum  imagery  and  collecting  below 
horizon  imagery  are  areas  we  need  to  address. 

Although  the  current  algorithm  takes  about  0.8  seconds  per  4  mp  frame  on  an 
AMD  Athlon  X2  3800+  processor,  most  of  the  computation  is  image  processing  and 
hence  amenable  to  parallelization.  Specialized  hardware  like  Digital  Signal  Proces¬ 
sors  are  promising. 

All  of  the  above  issues  affect  how  a  collision  detection  and  warning  system 
should  be  designed  so  as  to  cover  the  desired  field-of-regard. 
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Multiclass  Multimodal  Detection  and  Tracking 
in  Urban  Environments 


Luciano  Spinello,  Rudolph  Triebel,  and  Roland  Siegwart 


Abstract  This  paper  presents  a  novel  approach  to  detect  and  track  pedestrians  and 
cars  based  on  the  combined  information  retrieved  from  a  camera  and  a  laser  range 
scanner.  Laser  data  points  are  classified  using  boosted  Conditional  Random  Fields 
(CRF),  while  the  image  based  detector  uses  an  extension  of  the  Implicit  Shape 
Model  (ISM),  which  learns  a  codebook  of  local  descriptors  from  a  set  of  hand- 
labeled  images  and  uses  them  to  vote  for  centers  of  detected  objects.  Our  extensions 
to  ISM  include  the  learning  of  object  sub-parts  and  template  masks  to  obtain  more 
distinctive  votes  for  the  particular  object  classes.  The  detections  from  both  sen¬ 
sors  are  then  fused  and  the  objects  are  tracked  using  an  Extended  Kalman  Filter 
with  multiple  motion  models.  Experiments  conducted  in  real-world  urban  scenarios 
demonstrate  the  usefulness  of  our  approach. 


1  Introduction 


One  research  area  that  has  turned  more  and  more  into  the  focus  of  interest  during 
the  last  years  is  the  development  of  driver  assistant  systems  and  (semi-)autonomous 
cars.  In  particular,  such  systems  are  designed  for  operation  in  highly  unstructured 
and  dynamic  environments.  Especially  in  city  centers,  where  many  different  kinds 
of  transportation  systems  are  encountered  (walking,  cycling,  driving,  etc.),  the  re¬ 
quirements  for  an  autonomous  system  are  very  high.  One  key  prerequisite  for  such 
systems  is  a  reliable  detection  and  distinction  of  dynamic  objects,  as  well  as  an  ac¬ 
curate  estimation  of  their  motion  direction  and  speed.  In  this  paper,  we  address  this 
problem  focusing  on  the  detection  and  tracking  of  pedestrians  and  cars.  Our  system 
is  a  robotic  car  equipped  with  cameras  and  a  2D  laser  range  scanner.  As  we  will 
show,  the  use  of  different  sensor  modalities  helps  to  improve  the  detection  results. 
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The  system  we  present  here  employs  a  variety  of  different  methods  from  machine 
learning  and  computer  vision,  which  have  been  shown  to  provide  good  detection 
rates.  We  extend  these  methods  obtaining  substantial  improvements  and  combine 
them  into  a  complete  system  of  detection,  sensor  fusion  and  object  tracking.  We  use 
supervised-learning  techniques  for  both  kinds  of  sensor  modalities,  which  extract 
relevant  information  from  large  hand-labeled  training  data  sets.  In  particular,  the 
major  contributions  of  this  work  are: 

•  Several  extensions  to  the  vision  based  object  detector  by  Leibe  et  al.  [13]  using  a 
feature  based  voting  scheme  denoted  as  Implicit  Shape  Models  (ISM).  Our  major 
improvements  to  ISM  are  the  subdivision  of  objects  into  sub-parts  to  obtain  a 
more  differentiated  voting,  the  use  of  template  masks  to  discard  unlikely  votes, 
and  the  definition  of  superfeatures  that  exhibit  a  higher  evidence  of  an  object’s 
occurrence  and  are  more  likely  to  be  found. 

•  The  application  and  combination  of  boosted  Conditional  Random  Fields  (CRF) 
for  classifying  laser  scans  with  the  ISM  based  detector  using  vision.  We  use  an 
Extended  Kalman  Filter  (EKF)  with  multiple  motion  models  to  fuse  the  sensor 
information  and  to  track  the  objects  in  the  scene. 

This  paper  is  organized  as  follows.  The  next  section  describes  work  that  is  related 
to  ours.  Sec.  3  gives  a  brief  overview  of  our  overall  object  detection  and  tracking 
system.  In  Sec.  4,  we  introduce  the  implicit  shape  model  (ISM)  and  present  our 
extensions.  Sec.  5  describes  our  classification  method  of  2D  laser  range  scans  based 
on  boosted  Conditional  Random  Fields.  Then,  in  Sec.  6  we  explain  our  EKF-based 
object  tracker.  Finally,  we  present  experiments  in  Sec.  7  and  conclude  the  paper. 


2  Related  Work 

Several  approaches  can  be  found  in  the  literature  to  identify  a  person  in  2D  laser  data 
including  analysis  of  local  minima  [19,  23],  geometric  rules  [24],  using  maximum- 
likelihood  estimation  to  detect  dynamic  objects  [10],  using  AdaBoost  on  a  set 
of  geometrical  features  extracted  from  segments  [1],  or  from  Delaunay  neighbor¬ 
hoods  [20].  Most  similar  to  our  work  is  that  of  Douillard  et  al.  [5]  who  use  Condi¬ 
tional  Random  Fields  to  classify  objects  from  a  collection  of  laser  scans.  In  the  area 
of  vision-based  people  detection,  there  mainly  exist  two  kinds  of  approaches  (see 
[9]  for  a  survey).  One  uses  the  analysis  of  a  detection  window  or  templates  [8,  4], 
the  other  performs  a  parts -based  detection  [6,  11].  Leibe  et  al.  [13]  present  a  peo¬ 
ple  detector  using  Implicit  Shape  Models  (ISM)  with  excellent  detection  results  in 
crowded  scenes.  In  earlier  works,  we  showed  already  extensions  of  this  method  with 
a  better  feature  selection  and  an  improved  nearest  neighbor  search  [21,  22]. 

Existing  people  detection  methods  based  on  camera  and  laser  data  either  use  hard 
constrained  approaches  or  hand  tuned  thresholding.  Zivkovic  and  Krose  [25]  use  a 
learned  leg  detector  and  boosted  Haar  features  from  the  camera  images  and  employ 
a  parts-based  method.  However,  both  their  approach  to  cluster  the  laser  data  using 
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Canny  edge  detection  and  the  use  of  Haar  features  to  detect  body  parts  is  hardly 
suited  for  outdoor  scenarios  due  to  the  highly  cluttered  data  and  the  larger  variation 
of  illumination.  Schulz  [18]  uses  probabilistic  exemplar  models  learned  from  train¬ 
ing  data  of  both  sensors  and  applies  a  Rao-Blackwellized  particle  filter  (RBPF)  to 
track  a  person’s  appearance  in  the  data.  However,  in  outdoor  scenarios  illumination 
changes  often  and  occlusions  are  very  likely,  which  is  why  contour  matching  is  not 
appropriate.  Also,  the  RBPF  is  computationally  demanding,  especially  in  crowded 
environments.  Douillard  et  al.  [5]  also  use  image  features  to  enhance  the  object  de¬ 
tection  but  they  do  not  consider  occlusions  and  multiple  image  detection  hypotheses. 


3  Overview  of  Our  Method 

Our  system  consists  of  three  main  components:  an  appearance  based  detector  that 
uses  the  information  from  camera  images,  a  2D-laser  based  detector  providing 
structural  information,  and  a  tracking  module  that  uses  the  combined  information 
from  both  sensor  modalities  and  provides  an  estimate  of  the  motion  vector  for  each 
tracked  object.  The  laser  based  detection  applies  a  Conditional  Random  Field  (CRF) 
on  a  boosted  set  of  geometrical  and  statistical  features  of  2D  scan  points.  The 
image  based  detector  extends  the  multiclass  version  of  the  Implicit  Shape  Model 
(ISM) [13].  It  only  operates  on  a  region  of  interest  obtained  from  projecting  the 
laser  detection  into  the  image  to  constrain  the  position  and  scale  of  the  detected 
objects.  Then,  the  tracking  module  applies  an  Extended  Kalman  Filter  (EKF)  with 
two  different  motion  models,  fusing  the  information  from  camera  and  laser.  In  the 
following,  we  describe  the  particular  components  in  detail. 


4  Appearance  Based  Detection 

Our  vision-based  people  detector  is  mostly  inspired  by  the  work  of  Leibe  et  al.  [13] 
on  scale-invariant  Implicit  Shape  Models  (ISM).  In  summary,  an  ISM  consists  in 
a  set  of  local  region  descriptors,  called  the  co debook,  and  a  set  of  displacements 
and  scale  factors,  usually  named  votes,  for  each  descriptor.  The  idea  is  that  each 
descriptor  can  be  found  at  different  positions  inside  an  object  and  at  different  scales. 
Thus,  a  vote  points  from  the  position  of  the  descriptor  to  the  center  of  the  object 
as  it  was  found  in  the  training  data.  To  obtain  an  ISM  from  labeled  training  data, 
all  descriptors  are  clustered,  usually  using  agglomerative  clustering,  and  the  votes 
are  computed  by  adding  the  scale  and  the  displacement  of  the  objects’  center  to  the 
descriptors  in  the  codebook.  For  the  detection,  new  descriptors  are  computed  on  a 
test  image  and  matched  against  the  descriptors  in  the  codebook.  The  votes  that  are 
cast  by  each  matched  descriptor  are  collected  in  a  3D  voting  space,  and  a  maximum 
density  estimator  is  used  to  find  the  most  likely  position  and  scale  of  an  object. 
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In  the  past,  we  presented  already  several  improvements  of  the  standard  ISM  ap¬ 
proach  (see  [21,  22]).  Here,  we  show  some  more  extensions  of  ISM  to  further  im¬ 
prove  the  classification  results.  These  extensions  concern  both  the  learning  and  the 
detection  phase  and  are  described  in  the  following. 


4.1  ISM  Extensions  in  the  Learning  Phase 

Sub-Parts:  The  aim  of  this  procedure  is  to  enrich  the  information  from  the  voters 
by  distinguishing  between  different  object  subparts  from  which  the  vote  was  cast. 
We  achieve  this  by  learning  a  circular  histogram  of  interest  points  from  the  training 
data  set  for  each  object  class.  The  number  of  bins  of  this  histogram  is  determined 
automatically  by  using  &-means  clustering.  The  final  number  of  clusters,  here  de¬ 
noted  as  q ,  is  obtained  using  the  Bayesian  Information  Criterion  (BIC).  Note  that 
this  subpart  extraction  does  not  guarantee  a  semantical  subdivision  of  the  object 
(i.e.:  legs,  arms,  etc.  for  pedestrians)  but  it  is  interesting  to  see  that  it  nevertheless 
resembles  this  automatically  without  manual  interaction  by  the  user  (see  Fig.  1,  left 
and  center). 

Template  Masks:  In  the  training  data,  labeled  objects  are  represented  using  a  binary 
image  named  segmentation  mask.  This  mask  has  the  size  of  the  object’s  bounding 
box  and  is  1  inside  the  shape  of  the  object  and  0  elsewhere.  By  overlaying  all  these 
masks  for  a  given  object  class  so  that  their  centers  coincide  and  then  averaging  over 
them,  we  obtain  a  template  mask  of  each  object  class  (see  Fig.  1,  left  and  center). 
This  method  is  more  robust  against  noise  than,  e.g.,  Chamfer  matching  [3],  and  does 
not  depend  on  an  accurate  detection  of  the  object  contours.  We  use  the  template 
mask  later  to  discard  outlier  votes  cast  from  unlikely  areas. 

Superfeatures:  The  original  ISM  maintains  all  features  from  the  training  data  in  the 
codebook  as  potential  voters  and  does  not  distinguish  between  stronger  and  weaker 
votes.  This  has  the  disadvantage  that  often  too  many  votes  are  cast,  even  if  an  oc- 
curance  of  the  object  is  not  likely  given  the  training  data,  and  leads  to  many  false 
positive  detections.  To  overcome  this,  we  propose  to  extract  superfeatures  from  the 
training  data,  i.e.  descriptor  vectors  that  cast  a  stronger  vote  than  standard  features. 
We  keep  these  superfeatures  in  a  separate  codebook  to  avoid  clutter  in  the  implemen¬ 
tation.  A  superfeature  is  defined  by  a  local  density  maximum  in  descriptor  space, 
where  only  feature  vectors  are  considered  that  correspond  to  interest  points  from  a 
dense  area  in  the  image  space  (in  x,  y,  and  scale).  This  definition  ensures  that  for 
superfeatures  a  high  evidence  of  the  occurrence  of  the  object  is  combined  with  a 
high  probability  to  encounter  an  interest  point.  We  compute  superfeatures  by  first 
employing  mean  shift  estimation  on  all  interest  points  found  in  the  training  data  set 
for  each  class,  and  then  clustering  the  feature  vectors  in  descriptor  space  that  cor¬ 
respond  to  the  interest  points  from  the  found  areas  of  high  density.  This  clustering 
is  done  agglomeratively.  In  the  end,  we  select  the  50%  of  the  cluster  centers  that 
correspond  to  the  biggest  clusters.  The  right  part  of  Fig.  1  shows  an  example.  Note 
that  the  superfeatures  inherently  reflect  the  skeleton  of  the  object. 
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Fig.  1  Left  and  Center:  Sub-parts,  depicted  in  colored  slices,  and  template  masks,  in  white.  They 
are  computed  from  the  training  set.  Note  that  even  though  the  subparts  are  computed  unsupervised, 
they  exhibit  some  semantic  interpretation.  Right:  Superfeatures  are  stable  features  in  image  and 
descriptor  space.  This  figure  shows  Shape  Context  descriptors  at  Hessian  interest  points  (in  red) 
for  the  class  ’pedestrian’.  The  position  of  the  superfeatures  are  depicted  in  green. 


4.2  ISM  Extensions  in  the  Inference  Phase 

Sub-Parts  and  Template  Masks:  After  collecting  all  the  votes  for  a  given  set  of  ex¬ 
tracted  input  features  from  a  test  image,  we  first  discard  the  ones  that  are  implausible 
by  placing  the  template  mask  at  the  potential  object  centers  and  removing  the  votes 
that  are  cast  from  outside  the  mask.  For  the  remaining  ones  we  find  the  maximum 
density  point  m  using  mean  shift  and  insert  all  votes  for  m  into  a  circular  histogram 
with  q  bins:  one  per  sub-part  of  the  object.  We  denote  each  such  histogram  as  a 
hypothesis  h  =  (hi,...,hq)  of  an  object’s  position.  The  strength  a  of  a  hypothesis 
is  defined  as  the  sum  of  all  bins,  i.e.  the  number  of  all  voters  for  the  object  center. 
To  find  the  best  hypothesis  we  define  a  partial  order  -<:  based  on  a  function 

q 

h,  -<  hj  Ah(hi,hj)  <  0  where  4A(hj,h,-)  :=  £  sign (h\-h[).  (1) 

k=  1 

Using  this,  we  select  the  hypothesis  with  the  highest  order  (in  case  of  ambiguity 
we  use  the  one  with  the  highest  strength)  for  each  class.  Then,  we  find  the  best 
hypothesis  across  all  classes  as  described  below,  remove  all  its  voters  and  recompute 
the  ordering.  This  is  done  until  a  minimum  hypothesis  strength  omin  is  reached. 
Thus,  the  parameter  omin  influences  the  number  of  false  positive  detections. 
Superfeatures:  Superfeatures  and  standard  features  vote  for  object  centers  in  the 
same  voting  space,  but  the  votes  from  superfeatures  are  weighted  higher  (in  our 
case  by  a  factor  of  2).  Thus,  the  score  of  a  hypothesis  is  higher  if  the  fraction  of 
superfeatures  voting  for  it  is  higher.  In  some  cases  where  an  object’s  shape  visibility 
is  low  only  superfeatures  might  be  used  to  obtain  a  very  fast  detection. 

Best  Inter-Class  Hypothesis:  As  mentioned  above,  we  need  to  rate  the  best  object 
hypotheses  from  all  classes.  To  be  independent  on  an  over-  or  under-representation 
of  a  class  in  the  codebooks,  we  do  this  by  comparing  the  relative  areas  covered 
by  the  voters  from  all  class  hypotheses.  More  precisely,  we  define  a  square  area  y 
around  each  voter  that  depends  on  the  relative  scale  of  the  descriptor,  i.e.  the  ratio 
of  the  test  descriptor’s  scale  and  that  of  the  found  descriptor  in  the  codebook.  The 
fraction  of  the  area  covered  by  all  voters  of  a  hypothesis  and  the  total  area  of  the 
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object  (computed  from  the  template  mask)  is  then  used  to  quantify  the  hypothesis. 
Care  has  to  be  taken  in  the  case  of  overlapping  class  hypotheses.  Here,  we  compute 
the  set  intersection  of  the  interest  points  in  the  overlapping  area  and  assign  their 
corresponding  /  values  alternately  to  one  and  the  other  hypothesis. 


5  Structure  Based  Detection 

For  the  detection  of  objects  in  2D  laser  range  scans,  several  approaches  have  been 
presented  in  the  past  (see  for  example  [1,  16]).  Most  of  them  have  the  disadvantage 
that  they  disregard  the  conditional  dependence  between  data  points  in  a  close  neigh¬ 
borhood.  In  particular,  they  can  not  model  the  fact  that  the  label  li  of  a  given  scan 
point  z i  is  more  likely  to  be  lj  if  we  know  that  lj  is  the  label  of  z j  and  Zj  and  z*  are 
neighbors.  One  way  to  model  this  conditional  independence  is  to  use  Conditional 
Random  Fields  (CRFs)  [12],  as  shown  by  Douillard  et  al.  [5].  CRFs  represent  the 
conditional  probability  p( y  |  z)  using  an  undirected  cyclic  graph,  in  which  each  node 
is  associated  with  a  hidden  random  variable  li  and  an  observation  z In  our  case,  the 
li  is  a  discrete  label  that  ranges  over  3  different  classes  (pedestrian,  car  and  back¬ 
ground)  and  the  observations  z,-  are  2D  points  in  the  laser  scan.  At  this  point  we  omit 
the  mathematical  details  about  CRFs  and  refer  to  the  literature  (e.g.  [5, 17]).  We  only 
note  that  for  training  the  CRF  we  use  the  L-BFGS  gradient  descent  method  [14]  and 
for  the  inference  we  use  max-product  loopy  belief  propagation. 

We  use  a  set  of  statistical  and  geometrical  features  fn  for  the  nodes  of  the  CRF, 
e.g.  height,  width,  circularity,  standard  deviation,  kurtosis,  etc.  (for  a  full  list  see 
[20]).  We  compute  these  features  in  a  local  neighborhood  around  each  point,  which 
we  determine  by  jump  distance  clustering.  However,  we  don’t  use  this  features  di¬ 
rectly  in  the  CRF,  because,  as  stated  in  [17]  and  also  from  our  own  observation, 
the  CRF  is  not  able  to  handle  non-linear  relations  between  the  observations  and  the 
labels.  Instead,  we  apply  AdaBoost  [7]  to  the  node  features  and  use  the  outcome  as 
features  for  the  CRF.  For  our  particular  classification  problem  with  multiple  classes, 
we  train  one  binary  AdaBoost  classifier  for  each  class  against  the  others.  As  a  re¬ 
sult,  we  obtain  for  each  class  k  a  set  of  M  weak  classifiers  m  (decision  stumps)  and 
corresponding  weight  coefficients  a*  so  that  the  sum 

M 

:=£a,-Mi(f(z))  (2) 

i=  1 

is  positive  for  observations  assigned  with  the  class  label  k  and  negative  otherwise. 
We  apply  the  inverse  logit  function  a(x)  =  (l+  e~x)~l  to  g ^  to  obtain  a  classification 
likelihood.  Thus,  the  node  features  for  a  scan  point  z i  and  a  label  U  are  computed  as 
fw(z iji)  =  a(g/.(zz-)).  For  the  edge  features  fe  we  compute  two  values,  namely  the 
Euclidean  distance  d  between  the  points  z ;  and  z j  and  a  value  gij  defined  as 


gij(ii,Zj)  =  siga(gi(zi)gj(zj))(\gi(zi)\  +  \gj(zj)\). 


(3) 
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This  feature  has  a  high  value  if  both  z j  and  z j  are  equally  classified  (its  sign  is 
positive)  and  low  otherwise.  Its  absolute  value  is  the  sum  of  distances  from  the 
decision  boundary  of  AdaBoost  where  g( z)  =  0.  Thus,  we  define  the  edge  features  as 


(4) 


0)r  otherwise. 


The  intuition  behind  Eq.  (4)  is  that  edges  that  connect  points  with  equal  labels  have 
a  non-zero  feature  value  and  thus  yield  a  higher  potential. 


6  Object  Tracking  and  Sensor  Fusion 

To  fuse  the  information  from  camera  and  laser  and  for  object  tracking  we  use  an  Ex¬ 
tended  Kalman  Filter  (EKF)  as  presented  in  [21].  In  our  implementation,  we  use  two 
different  motion  models  -  Brownian  motion  and  linear  velocity  -  in  order  to  cope 
with  pedestrian  and  car  movements.  The  data  association  is  performed  in  the  camera 
frame:  we  project  the  detected  objects  from  the  laser  scan  into  the  camera  image. 
Assuming  a  fixed  minimal  object  height,  we  obtain  a  rectangular  search  region,  in 
which  we  consider  all  hypotheses  from  the  vision  based  detector  for  the  particular 
object  class.  Using  a  previously  calibrated  distance  ro  of  an  object  at  scale  1.0  (us¬ 
ing  the  normalized  training  height),  we  can  estimate  the  distance  rest  of  a  detected 
object  in  the  camera  image  by  multiplying  ro  with  the  scale  of  the  object.  Then,  rest 
is  compared  to  the  measured  distance  rmeas  from  the  laser  and  both  detections  are 
assigned  to  each  other  if  \rmeas  —  rest  |  is  smaller  than  a  threshold  Tj  (in  our  case  2m). 

We  track  cluster  centers  of  gravity  in  the  2D  laser  frame  using  two  system  states: 


one  for  each  motion  model.  Here,  (v™8  ,Vy°8)  is  the  velocity  of  the  cluster  centroid 
(xx°8,yy°8)  and  ci, . . .  ,cn  are  the  probabilities  of  all  n  classes.  We  use  a  static  state 
model  where  the  observation  vector  w  consists  of  the  position  of  the  cluster  and  the 
class  probabilities  for  each  sensor  modality: 


(5) 


Here,  (xcog ,  ycog )  is  a  new  observation  of  a  cluster  center  and  s  denotes  the  number  of 
sensors.  The  matrix  H  models  the  mapping  from  states  to  the  predicted  observation 
and  is  defined  as  H  =  {PT  S\  where  P  maps  to  pose  observations  and  the 

Si  map  to  class  probabilities  per  sensor.  For  example,  for  one  laser,  one  camera  and 
constant  velocity  we  have 


D_  /1  0  0  0  00\  c  _  c  _  /0  0  0  0  1  0\ 

r  —  Vo  1  0  0  0  o)  ^2—  (oooooiU 
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Fig.  2  Left:  For  car  classification,  we  use  codebooks  from  7  different  views.  For  training,  mirrored 
images  are  included  for  each  view  to  obtain  a  wider  coverage.  Center:  For  pedestrians  we  use  2 
codebooks  of  side  views  with  mirroring.  Lateral  views  have  sufficient  information  to  generalize 
frontal/back  views.  Right:  Setup  used  for  the  city  data  set.  Only  a  small  overlap  of  the  cameras’ 
field  of  view  is  used  to  cover  a  larger  part  of  the  laser  scans.  No  stereo  vision  is  used  in  this  work. 


7  Experimental  Results 

To  acquire  the  data,  we  used  a  car  equipped  with  two  CCD  cameras  and  a  2D  laser 
range  finder  mounted  in  front  (see  Fig.  2,  right).  The  3D  transform  between  the  laser 
and  the  camera  coordinate  frame  was  calibrated  beforehand.  We  acquired  training 
data  sets  for  both  sensor  modalities.  For  the  camera,  we  collected  images  of  pedes¬ 
trians  and  cars  that  we  labeled  by  hand.  The  pedestrian  data  set  consists  of  400 
images  of  persons  with  a  height  of  200  pixels  in  different  poses  and  with  different 
clothing  and  accessories  such  as  backpacks  and  hand  bags  in  a  typical  urban  en¬ 
vironment.  The  class  ’car’  was  learned  from  7  different  viewpoints  as  in  [13]  (see 
also  Fig.  2,  left).  Each  car  data  set  consists  of  100  pictures  from  urban  scenes  with 
occlusions.  Car  codebooks  are  learned  using  Shape  Context  (SC)  descriptors  [2]  at 
Hessian-Laplace  interest  points  [15].  The  pedestrian  codebook  uses  lateral  views 
and  SC  descriptors  at  Hessian-Laplace  and  Harris-Laplace  interest  points  for  more 
robustness.  Experience  shows  [13]  that  lateral  views  of  pedestrians  also  generalize 
well  to  front/back  views.  Our  laser  training  data  consists  of  800  annotated  scans 
with  pedestrians,  cars  and  background.  There  is  no  distinction  of  car  views  in  the 
laser  data  as  the  variation  in  shape  is  low.  The  range  data  consists  in  4  layers  where 
each  has  an  angular  resolution  of  0.25°  and  a  maximum  range  of  15m. 

To  quantify  the  performance  of  our  detector  we  acquired  two  datasets  containing 
cars  and  pedestrians.  The  results  of  our  detection  algorithm  are  shown  in  Fig.  3. 
Our  vision  based  detecion  named  ISMe2.0  is  compared  to  the  standard  ISM,  our 
previous  extension  ISMel.0,  and  for  the  pedestrian  class,  with  AdaBoost  trained  on 
Haar  features  (ABH).  For  the  class  ’car’,  we  averaged  the  results  over  all  different 
views.  We  can  see  that  our  method  yields  the  best  results  with  an  Equal  Error  Rate 
(EER)  of  72.3%  for  pedestrians  and  74%  for  cars.  The  improvements  are  mainly 
due  to  a  decreased  rate  of  false  positive  detections.  The  results  of  our  laser  based 
detection  are  shown  in  the  middle  column  of  Fig.  3.  We  can  see  that  our  approach 
using  boosted  CRFs  performs  better  than  standard  AdaBoost.  The  right  column  of 
Fig.  3  depicts  the  results  for  the  combined  detection  using  laser  and  vision.  These 
graphs  clearly  show  that  using  both  sensors  the  number  of  false  positive  detections 
decreases  and  the  hit  rate  increases.  Some  qualitative  results  are  shown  in  Fig.  4 
where  a  passing  car  and  a  crossing  pedestrian  are  correctly  detected  and  tracked. 
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Fig.  3  Quantitative  evaluation.  Upper  row:  pedestrian  detection,  Lower  row:  car  detection.  From 
left  to  right  we  show  the  results  only  using  camera,  only  using  laser,  and  both.  As  we  can  see,  our 
approach  outperforms  the  other  methods  for  both  sensor  modalities.  The  image  based  detection 
is  compared  with  standard  ISM,  our  first  extension  of  ISM  (ISMel.0)  and  AdaBoost  with  Haar 
features.  Our  CRF-based  laser  detector  is  compared  with  AdaBoost.  We  can  also  see  that  the  com¬ 
bination  of  both  sensors  improves  the  detection  result  of  both  single  sensors. 


In  addition,  we  evaluated  our  algorithm  on  a  third,  more  challenging  dataset  ac¬ 
quired  in  the  city  of  Zurich.  It  consists  of  4000  images  and  laser  scans.  The  equal 
error  rates  of  this  experiment  resulted  in  64.1%  (laser-only),  64.1%  (vision-only) 
and  68%  (combined)  for  pedestrians,  and  in  (72.2%, 73.5%, 75.7%)  for  cars.  As 
a  comparison,  we  evaluated  the  state-of-the-art  pedestrian  detector  based  on  His¬ 
togram  of  Oriented  Gradients  [4]  and  ABH  obtained  an  EER  of  36.4  and  8.9. 


Fig.  4  Cars  and  pedestrian  detected  and  tracked  under  occlusion,  clutter  and  partial  views.  In  the 
camera  images,  upper  row,  blue  boxes  indicate  car  detections,  orange  boxes  pedestrian  detections. 
The  colored  circle  on  the  upper  left  comer  of  each  box  is  the  track  identifier.  Tracks  are  shown  in 
color  in  the  second  row  and  plotted  with  respect  to  the  robot  reference  frame. 
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8  Conclusions 

We  presented  a  method  to  reliably  detect  and  track  multiple  object  classes  in  outdoor 
scenarios  using  vision  and  2D  laser  range  data.  We  showed  that  the  overall  perfor¬ 
mance  of  the  system  is  improved  using  a  multiple- sensor  system.  We  presented  sev¬ 
eral  extensions  to  the  ISM  based  image  detection  to  cope  with  multiple  classes.  We 
showed  that  laser  detection  based  on  CRFs  performs  better  than  a  simpler  AdaBoost 
classifier  and  presented  tracking  results  on  combined  data.  Finally,  we  showed  the 
usefulness  of  our  approach  through  experimental  results  on  real-world  data. 
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Abstract  A  convoy  problem  is  formulated  and  solved  for  two  four-wheeled  ve¬ 
hicles.  The  task  is  for  the  second  vehicle  to  follow  the  leader’s  trajectory  with  a 
constant  time  delay.  This  delayed  trajectory  can  be  viewed  as  the  trajectory  of  a 
delayed  leader.  This  novel  constant-time-delay  concept  allows  for  the  estimation 
of  the  delayed  leader’s  speed  and  heading  using  the  vehicle  kinematics.  Decoupled 
longitudinal  and  lateral  controllers  are  developed  based  on  the  constant-time-delay 
approach.  The  lateral  controller  includes  a  look-ahead  feature  to  compensate  for 
steering  delays.  Successful  field  trials  were  conducted  with  full-sized  military  vehi¬ 
cles  on  a  1.3 -kilometre  test  track.  The  tracking  errors  from  differential  global  posi¬ 
tioning  system  (DGPS)  ground  truth  covering  13  kilometres  are  presented. 

1  Introduction 

Motivating  our  research  is  a  military  scenario  in  which  a  vehicle  convoy  traverses 
hostile  territory  to  deliver  supplies.  Naturally,  equipping  every  vehicle  in  the  con¬ 
voy  with  armour  that  will  protect  its  occupants  is  expensive.  To  reduce  the  cost, 
autonomous  unarmoured  supply  vehicles  may  be  used,  whereby  each  autonomous 
vehicle  would  follow  the  trajectory  of  the  vehicle  ahead  of  it.  To  follow  the  vehicle 
ahead,  an  autonomous  vehicle  can  sometimes  take  advantage  of  a  global  positioning 
system  (GPS),  inter-vehicle  communications,  and/or  lane  markers/magnets.  How¬ 
ever,  since  the  vehicle  convoy  is  in  hostile  territory,  GPS  signals  may  be  jammed, 
inter- vehicle  communications  may  be  intercepted,  the  roads  may  be  unstructured. 

Based  on  this  motivating  example,  our  project  goal  is  to  design  and  test  a  control 
system  to  allow  a  convoy  of  full-sized  autonomous  vehicles  with  large  inter- vehicle 
spacing  to  follow  the  lead  vehicle’s  trajectory  without  cutting  comers  on  turns.  The 
control  system  should  use  only  on-board  sensors,  avoiding  the  use  of  GPS,  inter¬ 
vehicle  communications,  and  lane  markers/magnets.  This  paper  reports  our  prelim¬ 
inary  experimental  results  performed  on  two  full-sized  vehicles  where  the  leader 
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vehicle  is  manually  driven  and  the  follower  vehicle  is  autonomously  controlled.  In 
our  field  trials  to  date,  we  do  not  use  lane  marker/magnets  or  radio  communications 
between  vehicles.  However,  we  do  use  GPS  to  measure  the  follower’s  position  due 
to  a  poor  odometry  system. 

To  our  knowledge,  there  have  been  only  a  few  prior  experimental  works  relevant 
to  our  project  goal.  The  most  relevant  is  Gehrig  and  Stein  [4],  who  tested  a  path¬ 
following  strategy  that,  with  the  addition  of  autonomous  speed  control,  could  poten¬ 
tially  follow  the  leader’s  trajectory  at  large  distances  without  cutting  comers.  Their 
‘Control  Using  Trajectory’  (CUT)  algorithm  stores  the  time  history  of  the  leader’s 
path  and  steers  towards  the  leader’s  position  that  is  a  constant  distance  ahead  of 
the  follower’s  current  position.  Although  Gehrig  and  Stein’s  experiments  showed 
improvements  in  tracking  the  leader’s  path  over  a  system  without  CUT,  the  experi¬ 
mental  data  was  limited  to  less  than  15  seconds. 

Other  experimental  works  that  use  only  on-board  sensors  include  Benhimane  et 
al.  [1],  Franke  et  al.  [3],  and  Kehtamavaz  et  al.  [6].  Benhimane  et  al.  developed  a 
vehicle-following  system  with  the  objective  of  tracking  a  virtual  leader  a  constant 
distance  behind  the  leader.  However,  since  the  trajectory  of  the  leader  is  not  stored, 
the  follower  may  cut  corners  on  turns.  Furthermore,  their  experimental  data  was 
limited  to  2  minutes,  and  the  maximum  follower  speed  was  1  m/s.  Both  Franke  et 
al.  and  Kehtarnavaz  et  al.  had  vehicle-following  systems  that  were  based  on  the  fol¬ 
lower’s  range  and  bearing  to  the  leader.  As  such,  both  implemented  steering  controls 
that  simply  steered  toward  the  leader.  Such  steering  controls  are  known  to  deviate 
from  the  leader’s  path  [4]. 

Daviet  and  Parent  [2]  also  performed  vehicle-following  experiments  without  us¬ 
ing  inter- vehicle  communications.  Their  vehicles  travelled  up  to  10  m/s,  but  the 
corresponding  distance  separation  was  only  4.5  metres,  and  the  experimental  data 
shown  was  for  only  30  seconds.  They  also  used  the  strategy  of  simply  steering  to¬ 
wards  the  leader,  but  they  do  suggest  storing  the  leader’s  trajectory  in  a  future  imple¬ 
mentation.  Although  Schneiderman  et  al.  [8]  used  radio  communications,  the  radio 
link  was  used  to  interact  with  the  follower’s  computer  system.  They  demonstrated 
a  path-following  system  with  a  33-kilometre  traverse  at  speeds  of  13.9  to  20.8  m/s 
and  at  following  distances  of  5  to  15  metres.  However,  only  the  steering  was  au¬ 
tonomous,  and  the  follower  did  not  store  the  leader’s  path. 


1.1  Problem  Formulation 

To  meet  our  project  goal,  we  take  a  novel  approach  to  tracking  the  leader’s  trajectory. 
Our  objective  is  for  the  follower  to  track  the  planar  trajectory  of  the  leader  delayed 
by  a  constant  time,  T.  Specifically,  if  (x(t),y(t))  is  the  position  of  the  follower  with 
respect  to  an  inertial  frame  and  (vo(^),yo(0)  *s  the  position  of  the  leader  with  re¬ 
spect  to  the  same  frame,  then  we  want  to  track  (xo(t  —  T),yo(/  —  t)).  For 

brevity,  we  define  the  delayed  leader  position,  (xo (t  —  t)  , yo (t  —  t) ) ,  as  (x^ (t),y^{t)). 
The  leader,  delayed  leader,  and  follower  are  shown  in  Fig.  la.  It  is  important  to  note 
that  our  definition  is  different  from  the  constant  time  headway  [9]  definition.  The 
tracking  error  in  our  definition  is  with  respect  to  the  leader’s  delayed  position,  while 
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the  tracking  error  in  constant  time  headway  is  with  respect  to  the  leader’s  current 
position. 


a.  Leader 

/  Oo(t),yo(T)) 

Delayed  Leader 

Follower^-^^X  \  9  ’ 

y  MA  Vd (t))  :=  (ar0 (t  -  r),  y0(t  -  r)) 

(x(t),y(t)) 


Fig.  1  (a)  Leader,  delayed  leader,  and  follower  in  an  inertial  frame,  (b)  The  leader’s  and  follower’s 
positions  are  related  by  the  follower’s  heading,  6 ,  and  the  range,  p,  and  bearing,  (j),  to  the  leader. 

There  are  two  main  advantages  to  our  approach:  1)  tracking  the  delayed  leader 
provides  us  with  ‘future’  delay ed-leader  positions  since  we  have  measurements  up 
to  the  leader’s  current  position;  and  2)  the  following  distance  varies  based  on  the 
leader’s  speed.  The  first  advantage  allows  us  to  track  the  leader’s  trajectory  without 
having  to  measure  the  leader’s  speed  or  heading.  Instead,  the  delayed  leader’s  speed 
and  heading  are  estimated  using  the  delayed  leader’s  future  positions.  Having  future 
position  measurements  also  allows  our  system  to  use  interpolation  to  handle  the  oc¬ 
casional  data  dropout,  which  is  to  be  expected  with  a  vehicle-following  system  on 
bumpy  roads.  Due  to  space  limitations,  the  details  of  our  interpolation  technique 
are  not  discussed  in  this  paper.  The  second  advantage  naturally  causes  the  follow¬ 
ing  distance  to  be  smaller  when  the  leader  slows  down  on  difficult  portions  of  the 
road,  e.g.,  turns  and  rough  terrain.  The  smaller  following  distance  allows  for  more 
accurate  measurements  of  the  leader’s  position,  which  will  help  the  tracking  during 
those  difficult  portions. 


2  System  Architecture  and  Design 

Given  the  problem  formulation,  the  follower  requires  a  means  to  localize  its  posi¬ 
tion,  (x,y),  and  heading,  6,  relative  to  an  inertial  frame.  This  localization  can  be 
done  using  GPS  or  wheel  odometry.  Since  the  convoy  is  to  operate  in  hostile  ter¬ 
ritory,  our  preference  is  to  use  wheel  odometry.  To  measure  the  leader’s  relative 
position,  we  use  a  pan-tilt-zoom  monocular  camera  system  with  a  colour  tracker 
to  servo  around  a  coloured  target  on  the  back  of  the  leader  [5].  Knowing  the  off¬ 
sets  between  the  camera  and  the  follower’s  rear  axle  and  between  the  target  and  the 
leader’s  rear  axle,  we  can  obtain  the  range,  p,  and  bearing,  (j),  to  the  leader  from  the 
camera’s  output,  as  shown  in  Fig.  lb. 

A  top  level  diagram  of  our  vehicle  control  system  is  shown  in  Fig.  2.  The  camera 
system  outputs  the  range  and  bearing  to  the  leader.  The  odometry  measures  the  fol¬ 
lower’s  speed  and  steering,  (v,y),  and  provides  estimates  of  the  follower’s  position 
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and  heading,  (v,y,  0).  The  range,  bearing,  and  odometric  estimates  are  fed  into  a 
nonlinear  observer  to  produce  estimates  of  the  delayed  leader’s  position,  heading, 
and  speed,  (xd,yd,  0d,  vd),  along  with  estimates  of  a  look-ahead  point’s  position  and 
heading,  (x\ ,yi,  6\).  These  estimates  are  used  by  the  control  laws  to  produce  the  com¬ 
manded  speed  and  steering,  (vc,  yc),  which  are  the  inputs  to  the  follower.  The  details 
of  the  vehicle  model  for  the  follower,  the  control  laws,  and  the  nonlinear  observer 
are  provided  in  the  following  subsections. 


Fig.  2  Top  level  diagram  of  vehicle  control  system. 

2.1  Follower  Vehicle  Model 

For  the  vehicle  kinematics,  we  chose  the  bicycle  model,  which  is  given  by 

v 

i  =  vcos0,  y  =  vsin0,  0  =  —  tan  y . 

a 

where  (x,y)  is  the  position  of  the  rear  axle,  0  is  the  vehicle’s  heading,  d  is  the  dis¬ 
tance  between  the  front  and  rear  axles,  v  is  the  vehicle’s  speed,  and  7  is  the  vehicle’s 
steering  angle.  We  derived  a  local  linear  model  by  examining  the  longitudinal  and 
lateral  tracking  errors,  (^1,^2),  in  the  follower’s  frame.  The  tracking  errors  are  de¬ 
fined  to  be 


e\ 

cos  6  sin  0 

xd—x 

_ei_ 

—  sin  0  cos  0 

yd-y_ 

Linearizing  the  tracking  errors  and  the  bicycle  model  along  a  constant- velocity  tra¬ 
jectory,  we  obtain  a  local  kinematic  model  for  the  follower  given  by 

vd 

ei=vd-v,  e2  =  vde3,  e3  =  -  —  y, 

d 

where  vd  is  the  speed  of  the  delayed  leader  and  £3  :=  0d  —  6  is  the  heading  error. 

For  the  vehicle  dynamics,  we  assume  the  follower  has  an  inner-loop  controller, 
Cv,y,  that  stabilizes  its  throttle  and  steering  dynamics1.  This  assumption  creates  an 
inner/outer  loop  architecture  where  CV)7  stabilizes  the  vehicle  dynamics  in  the  inner 
loop  and  our  controller  controls  the  vehicle  kinematics  in  the  outer  loop,  as  shown 
in  Fig.  3. 


1  This  was  the  case  for  the  vehicles  that  we  employed. 
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Fig.  3  (a)  Inner/outer  loop  architecture  for  the  follower,  (b)  The  outer-loop  controller  is  designed 
by  assuming  the  inner  loop  is  a  unity  gain. 


A  common  practice  with  the  above  architecture  is  to  design  the  outer-loop  con¬ 
troller  by  treating  the  inner  loop  as  a  unity  gain  [7].  This  assumption  works  well  if 
the  gains  of  the  outer-loop  controller  are  kept  low  enough  such  that  the  bandwidth 
of  the  outer  loop  is  approximately  5  to  10  times  smaller  than  the  bandwidth  of  the 
inner  loop.  As  a  result,  the  kinematic  model  of  the  follower  is 

vd 

ei=vd-vc,  ^2=vd^3,  e3  =  -  —  yc.  (1) 

In  our  implementation,  we  validated  the  bandwidth  separation  between  our  inner 
and  outer  loops  through  simulation  and  through  the  actual  tuning  of  the  gains  in 
experimental  trials. 


2.2  Control  Laws 

Since  the  longitudinal  and  lateral  directions  of  (1)  are  decoupled,  it  can  be  shown 
that  choosing 


vc  =  vd  +  kP)i<?i  ,  kpj  >  0 

7c  =  kp,2e2  +  £p,3^3  ,  £p,2  >  ^P,3  >  0 

will  regulate  the  tracking  errors  to  zero  for  the  linearized  model. 

After  some  initial  field  trials,  we  discovered  that  the  follower  was  turning  late, 
which  caused  a  large  lateral  error.  We  hypothesized  that  the  late  turning  was  caused 
by  the  low  gains  in  our  controller  and  the  delays  in  the  vehicle’s  steering  dynamics. 
To  compensate,  we  added  a  look-ahead  feature  for  the  lateral  controller.  We  define 
the  look-ahead  point  as 

(xi(f),yi(f))  :=  (x0(t -T  +  l),y0(t -T  +  l)) ,  0  <  /  <  T  , 
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where  /  is  a  constant  look-ahead  time.  With  a  look-ahead  time  defined,  the  lateral 
and  heading  errors  are  computed  by 

ei  —  —  (x\  —  x )  sin  6  +  (y\  —  y )  cos  6  ,  £3  =  0\  —  G  , 

where  0\  is  the  heading  of  the  look-ahead  point. 

2.3  Nonlinear  Observer 

From  the  control  laws,  it  is  obvious  that  we  need  estimates  of  the  tracking  and 
heading  errors,  (^1,^2,  £3),  and  the  delayed  leader’s  speed,  va-  The  tracking  and 
heading  errors  are  calculated  from  the  state  of  the  follower,  (v,y,  0),  the  state  of 
the  delayed  leader,  (vd,yd,  0d)>  and  the  state  of  the  look-ahead  point,  (x\,y\,0[).  The 
delayed  leader’s  speed  can  be  calculated  from  the  delayed  leader’s  instantaneous 
change  in  position,  (Abyd)- 

The  follower’s  odometry  provides  an  estimate  of  its  state.  The  delayed  leader’s 
position  is  simply  the  leader’s  current  position  delayed  by  T.  From  Fig.  lb,  the 
leader’s  position  can  be  computed  by 

vo  =  v  +  pcos(0  +  0)  ,  yo  =  y  +  p  sin  (0  +  6)  . 

We  use  a  data  buffer  to  emulate  the  constant  time  delay  in  our  implementation.  The 
position  of  the  look-ahead  point  is  calculated  in  the  same  manner. 

From  the  bicycle  kinematics,  the  delayed  leader’s  speed  and  heading  can  by  cal¬ 
culated  by 

vd  =  V^d+^d  ’  0d  =  atan2(yd,*d)  • 

We  obtain  an  estimate  of  id  by  fitting  a  line  to  an  ^-second  window  of  vo  measure¬ 
ments  centred  around  t  —  T,  where  n  is  configurable  and  a  multiple  of  the  data  rate 
for  vo.  A  depiction  is  shown  in  Fig.  4,  where  the  line  is  fitted  using  least  squares. 
The  slope  of  the  line  is  then  used  as  the  estimate  of  id.  The  estimate  of  yd  is  ob¬ 
tained  in  the  same  manner,  thus  allowing  us  to  calculate  Q&.  A  similar  approach  is 
used  to  estimate  the  look-ahead  point’s  heading.  Plots  of  the  estimated  and  actual 
delayed  leader’s  speeds  and  headings  during  a  field  trial  are  given  in  the  next  section 
to  validate  this  windowing  technique. 


Fig.  4  Estimating  id  using  a 
line  fitting  window  centred 
around  t  —  T.  The  window 
size  is  n  seconds,  where  n  is 
configurable  and  a  multiple  of 
the  data  rate  for  jto.  The  line  is 
fitted  using  least  squares,  and 
the  slope  of  the  line  is  used  as 
the  estimate  of  id- 


Line  Fitting  Window 
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3  Field  Trials 

Field  trials  were  conducted  at  Defence  Research  and  Development  Canada  (DRDC)- 
Suffield,  Alberta,  Canada,  in  November,  2008,  with  two  Multi  Agent  Tactical  Sentry 
(MATS)  vehicles.  A  picture  of  the  MATS  leader  vehicle  is  shown  in  Fig.  5.  The 
coloured  target  is  used  by  the  follower’s  camera  system  to  measure  the  range  and 
bearing  to  the  leader.  Each  MATS  vehicle  is  equipped  with  an  on-board  computer, 
a  pan-tilt-zoom  monocular  camera,  a  GPS  antenna,  and  a  data  link  to  a  ground  sta¬ 
tion  to  receive  DGPS  corrections.  The  DGPS  data  serves  to  provide  ground  truth 
for  the  trials.  Each  MATS  is  also  equipped  with  odometric  sensors  that  provide  the 
vehicle’s  current  speed  and  steering  angle. 


Fig.  5  The  MATS  leader 
vehicle.  The  coloured  target 
is  used  by  the  follower’s 
camera  system  to  measure 
the  range  and  bearing  to  the 
leader.  Each  MATS  vehicle 
is  equipped  with  an  on-board 
computer,  a  pan-tilt-zoom 
monocular  camera,  a  GPS 
antenna,  and  a  data  link  to 
a  ground  station  to  receive 
DGPS  corrections. 


The  test  track  is  a  1.3-kilometre  loop  shown  in  Fig.  6.  The  track  is  a  gravel  road 
and  is  approximately  7  metres  wide.  The  most  difficult  portion  of  the  track  is  the 
60-degree  hairpin  turn  located  at  the  north-west  corner  of  the  track. 


Fig.  6  The  1.3-kilometre  test 
track  used  for  field  trials.  The 
track  is  a  gravel  road  and  is 
approximately  7  metres  wide. 
The  most  difficult  portions  of 
the  track  are  the  U-turn  and 
the  hairpin  turn. 


3.1  Odometry  Localization 

Tracking  the  delayed  leader  using  odometry  localization  proved  to  be  difficult  with 
the  follower’s  current  odometric  sensors.  The  problem  stemmed  from  an  encoder 
located  on  the  steering  column  used  to  measure  the  steering  angle.  Since  there  was 
significant  ‘play’  between  the  steering  wheel  and  the  front  wheels,  the  steering  mea¬ 
surement  did  not  accurately  represent  the  angle  of  the  front  wheels  and  was  highly 
sensitive  to  road  slope.  As  a  result,  the  follower’s  heading  estimate  was  very  inaccu- 
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rate,  resulting  in  poor  path  estimates.  An  example  is  shown  in  Fig.  7,  when  tracking 
with  odometry  localization  was  performed  off  the  test  track.  In  this  case,  a  crowned 
road  caused  the  odometry  to  produce  a  circular  path  estimate  when  the  follower  was 
actually  traveling  straight. 


Fig.  7  The  follower’s  actual 
path  in  comparison  with  its 
path  estimated  from  odo- 
metric  sensors.  A  crowned 
road  caused  the  odometry  to 
produce  a  circular  path  esti¬ 
mate  when  the  follower  was 
actually  traveling  straight. 


3.2  DGPS  Localization 

Using  DGPS  for  localization  of  its  position,  the  follower  was  able  to  successfully 
track  the  delayed  leader  for  10  laps  of  the  1.3-kilometre  track.  A  summary  of  the 
test  results  is  shown  in  Table  1.  The  constant  time  delay  was  set  to  8  seconds,  and 
the  look-ahead  time  was  set  to  3  seconds.  The  mean  follower  speed  for  the  entire 
traverse  was  2.2  m/s  (7.9  km/h),  and  the  mean  following  distance  was  19  metres. 
The  mean  lateral  error  was  0.07  metres  with  a  standard  deviation  of  0.46  metres. 
The  maximum  absolute  lateral  error  was  2.73  metres,  which  occurred  during  one  of 
the  turns  at  the  hairpin.  Since  it  is  more  natural  to  calculate  an  error  with  respect 
to  the  reference,  the  lateral  errors  here  are  calculated  in  the  delayed  leader’s  frame. 
The  difference  between  tracking  errors  in  the  delayed  leader’s  frame  and  tracking 
errors  in  the  follower’s  frame  is  shown  in  Fig.  8. 

Plots  of  DGPS  ground  truth  data  for  a  typical  lap  around  the  track  are  shown 
in  Fig.  9.  Figure  9a  shows  the  leader’s  and  follower’s  paths,  while  a  close-up  of 
the  hairpin  turn  is  shown  in  Fig.  9b.  The  longitudinal  and  lateral  errors  in  the  de¬ 
layed  leader’s  frame  are  shown  in  Fig.  9c,  along  with  the  delayed  leader’s  and  fol¬ 
lower’s  speeds.  The  simultaneous  large  error  increases  around  the  50-second  and 
400-second  marks  correspond  to  the  U-turn  and  the  hairpin  turn,  respectively.  It 
should  be  noted  that  the  longitudinal  error  did  not  get  to  zero  during  the  13 -kilometre 
traverse.  We  have  fixed  this  issue  by  adding  an  integral  gain  on  the  longitudinal  error 
in  the  control  law  for  the  commanded  speed.  Unfortunately,  due  to  time  and  weather 
constraints,  we  have  not  been  able  to  properly  tune  and  test  the  follower  with  the 
improved  controller.  Since  the  longitudinal  error  was  always  positive  and  our  con¬ 
trol  law  is  vc  =  Vd  +  the  follower’s  speed  was  always  slightly  larger  than  the 

delayed  leader’s  speed.  However,  because  the  follower  deviated  from  the  leader’s 
path,  it  was  never  able  to  catch  up  to  the  delayed  leader,  resulting  in  its  inability  to 
reduce  the  longitudinal  error  to  zero.  In  Fig.  9d,  the  delayed  leader’s  actual  speed 
and  heading  are  compared  with  their  estimates.  The  similarities  of  the  plots  suggest 
that  windowing  around  t  —  T  yields  accurate  speed  and  heading  estimates. 
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Table  1  Summary  of  results  from  10  laps  of  1.3-kilometre  track.  The  constant  time  delay  was  set 
to  8  seconds,  and  the  look-ahead  time  was  set  to  3  seconds.  The  controller  gains  (kp?i ,  kp?2,  kp, 3)  = 
(0.08  s-1 ,  0.04,  0.04).  The  lateral  error,  £2,  is  calculated  in  the  delayed  leader’s  frame,  and  tf  is  the 
finishing  time  for  the  entire  13-kilometre  traverse. 


Description 

Symbol 

Value 

Mean  Follower  Speed 

V 

2.2  m/s  (7.9  km/h) 

Mean  Following  Distance 

19  m 

Mean  Lateral  Error  =b Standard  Deviation 

T,Joe2{<l)dq 

0.07  m  ±0.46  m 

Maximum  Absolute  Lateral  Error 

max  |  £2  (01 

2.73  m 

Fig.  8  The  longitudinal  and 
lateral  errors,  (£] ,  £2),  are  in 
the  delayed  leader’s  frame, 
and  the  longitudinal  and 
lateral  errors,  (^1,^2),  are  in 
the  follower’s  frame. 


Delayed  Leader’s  Frame 
£l 


Vd 

line  A 

....V"*1 


y 


£2 


c. 


C2 


Follower’s  Frame  Cl 


Leader’s  Path"  ■  ■  Follower’s  Patlj 


Fig.  9  (a)  The  leader’s  and  follower’s  paths,  (b)  A  close-up  of  the  paths  during  the  hairpin  (c)  The 
longitudinal  and  lateral  tracking  errors,  and  the  delayed  leader’s  and  follower’s  speeds,  (d)  The 
delayed  leader’s  speed  and  heading  compared  with  their  estimates  from  windowing. 


4  Summary  and  Future  Work 

We  have  introduced  the  novel  concept  of  tracking  the  trajectory  of  a  vehicle  ahead 
delayed  by  a  constant  time.  This  constant  time  delay  forms  the  basis  for  our  con- 
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troller  design  and  allows  us  to  use  ‘future’  position  measurements  to  estimate  the 
delayed  leader’s  speed  and  heading.  Successful  field  trials  were  conducted  with 
two  full-sized  vehicles  over  a  13-kilometre  traverse  in  which  the  follower  vehicle 
achieved  a  mean  lateral  error  of  0.07  metres  with  a  standard  deviation  of  0.46  me¬ 
tres. 

For  future  work,  we  would  like  to  perform  vehicle-following  experiments  with 
odometry  localization.  We  are  confident  that  odometry  localization  will  work  with 
our  approach  as  long  as  the  odometric  estimates  are  reasonably  accurate  over  t 
(the  constant  time  delay)  seconds.  To  fix  our  poor  heading  estimate,  we  plan  to 
implement  a  heading  gyro  on  the  follower.  We  would  also  like  to  conduct  tests  with 
multiple  followers  and  at  higher  speeds.  Testing  multiple  followers  will  provide 
us  with  important  data  on  how  tracking  errors  propagate  in  our  system.  To  test  at 
higher  speeds,  we  plan  to  implement  gain  scheduling  for  our  lateral  controller  since 
our  lateral  closed-loop  system  is  dependent  on  the  delayed  leader’s  speed.  These 
tests  will  further  validate  the  feasibility  of  our  approach  and  will  bring  us  closer  to 
an  operational  autonomous  convoy. 
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Swing  trajectory  control  for  large  excavators 
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Abstract  There  is  a  strong  push  within  the  mining  sector  to  automate  equipment 
such  as  large  excavators.  A  challenging  problem  is  the  control  of  motion  on  high  in¬ 
ertia  degrees  of  freedom  where  the  actuators  are  constrained  in  the  power  they  can 
deliver  to  and  extract  from  the  system  and  the  machine’s  underlying  control  system 
sits  between  the  automation  system  and  the  actuators.  The  swing  motion  of  an  elec¬ 
tric  mining  shovel  is  a  good  example.  This  paper  investigates  the  use  of  predictive 
models  to  achieve  minimum  time  swing  motions  in  order  to  address  the  question 
what  level  of  performance  is  possible  in  terms  of  realizing  minimum  time  motions 
and  accurate  positional  control.  Experiments  are  described  that  explore  these  ques¬ 
tions.  The  work  described  is  associated  with  a  project  to  automate  an  electric  mining 
shovel  and  whilst  the  control  law  discussed  here  is  a  much  simplified  form  of  that 
used  in  this  work,  the  experimental  study  sheds  considerable  light  on  the  problem. 


1  Introduction 

An  electric  mining  shovel  (EMS)  is  a  large  electro-mechanical  excavator  commonly 
used  in  open-cut  mining  to  load  haul  trucks  .  They  are  critical  production  units  at 
most  open-cut  mine  sites  and  there  is  an  ongoing  need  to  improve  their  productivity 
through  automation  of  the  loading  process.  These  machines  are  actuated  by  DC 
motors,  have  three  primary  degrees  of  freedom  used  for  excavation  and  achieve 
mobility  through  crawler  tracks.  CRCMining  and  the  CSIRO  have  been  working 
together  on  the  development  of  an  automation  system  for  mining  shovels. 

An  automation  system  replacing  the  operator  must  necessarily  implement  a  po¬ 
sition  servo  capability.  Our  particular  interest  in  this  paper  is  in  achieving  position 
control  for  the  swing  motion.  Control  of  this  freedom  presents  challenges  because  (i) 
the  rotational  inertia  of  the  machine  house  about  the  swing  axis  is  very  large  relative 
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to  the  effective  inertia  of  the  swing  motors  (ii)  the  rates  at  which  the  swing  motors 
can  deliver/extract  energy  to  the  swing  motion  are  sufficiently  constrained  that  they 
become  important  influences  on  the  control  problem  and  (iii)  in  dealing  with  the 
issues  that  arise  because  of  (i)  and  (ii),  the  existing  control  system  for  swing  mo¬ 
tion  has  a  hybrid  structure  that  must  be  accommodated  for  in  the  automation  layer 
control  system.  The  last  of  these  points  emphasizes  the  challenge  of  working  with 
multi-layered  control  systems  where  various  parties  develop  and  support  the  differ¬ 
ent  layers.  In  this  problem  the  automation  layer  control  system  must  be  removable 
so  that  the  machine  can  be  operated  by  a  human  operator.  To  achieve  this  and  for 
reasons  of  safety  integrity,  the  automation  layer  produces  outputs  that  feed  in  at  the 
same  point  as  the  references  provided  by  the  operator  joysticks. 

The  strategy  we  explore  in  this  paper  considers  the  swing  motion  in  isolation 
from  the  other  degrees  of  freedom  and  looks  to  develop  an  approximate  minimum 
time  controller  that  is  inspired  by  the  Pontryagin  minimum  principle  [5].  The  con¬ 
trol  law  uses  a  model  of  the  swing  motion  with  the  input  being  the  joystick  reference 
and  the  output  being  the  swing  angle  in  a  receding-horizon  framework  to  determine 
switching  points  for  the  joystick  references  that  deliver  near  minimum-time  trajec¬ 
tories.  The  model  of  the  swing  motion  control  system  has  been  described  previously 
in  Ref.  [6]. 

The  paper  has  the  following  structure.  Section  2  summarizes  the  characteristics 
of  the  control  system  and  electric  drive  dynamics  for  the  swing  drive.  Section  3 
discusses  the  state-space  model  formulation  and  the  application  of  this  modeling 
technique  to  command  the  shovel  to  a  desired  swing  angle.  In  Section  4  we  provide  a 
demonstration  of  the  use  of  these  models  for  trajectory  control  using  the  Pontryagin 
inspired  framework  control  system. 


2  Swing  drive  dynamics 

The  shovel  used  in  this  study  has  an  ABB  DCS/DCF600  Multi-Drive  controller  to 
regulate  motor  speed,  armature  current  and  field  current  in  the  swing  DC  motor. 
The  controller  is  made  up  of  four  integral  components;  a  PID  or  PI  motor  speed 
control  loop,  an  armature  current  saturation  limiter,  a  PI  current  control  loop  and  an 
EMF-field  current  regulator. 

The  swing  drive  uses  a  combination  of  torque  control  and  bang-bang  speed  con¬ 
trol,  whereby  the  swing  joystick  position  generates  a  piecewise  speed  reference 
and  an  armature  current  saturation  limit.  A  schematic  of  the  swing  drive  model 
is  shown  in  Figure  1 .  The  difference  between  the  reference  and  actual  swing  motor 
speed  feeds  the  Proportional-Integral-Derivative  (PID)  speed  controller  incorporat¬ 
ing  derivative  filtering.  The  output  of  the  speed  controller  is  scaled  into  a  reference 
armature  current  that  is  the  limited  proportionally  according  to  the  amplitude  of  the 
swing  joystick  reference.  The  error  between  the  limited  current  reference  and  the 
actual  armature  current  feeds  into  a  PI  current  controller  that  outputs  an  armature 
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Table  1  Nomenclature 


Category 

Notation 

Description 

Model  Inputs 

/ 

coulomb  friction  (N) 

h 

field  current(A) 

j 

joystick  reference 

T 

gravitational  torque  load  (Nm) 

COd 

desired  motor  speed  (rad/s) 

Model  States 

e(0 

motor  speed  error  in  the  speed  controller  (rad/s) 

ei 

armature  current  error  in  the  current  controller  (A) 

I 

armature  current 

id 

reference  armature  current  prior  to  saturation  (A) 

0 

motor  position  (rad) 

CO 

motor  speed  (rad/s) 

Controller  Parameters 

G 

describing  function  gain 

Kco 

speed  controller  proportional  gain 

Kj 

current  controller  proportional  gain 

KTi 

speed  to  current  scaling 

Tl 

Jco 

speed  controller  integration  time  constant  (s) 

rrd 

JCQ 

speed  controller  differential  time  constant  (s) 

T f 

1co 

speed  controller  filter  time  constant  (s) 

Ti 

JI 

current  controller  integration  time  constant  (s) 

DC  Drive  Parameters 

b 

drive  damping  coefficient 

J 

drive  inertia  resolved  to  the  motor  (Nm/s) 

Kt 

motor  torque  constant  (Nm/A) 

Kem  f 

motor  back  EMF  constant 

L 

motor  armature  inductance  (Henries) 

R 

motor  armature  resistance  (Ohms) 

Subscripts 

s 

swing  drive 

voltage  to  the  swing  motor.  The  swing  motor  has  a  constant  field  current  with  the 
DCF600  maintaining  the  field  voltage  at  a  steady  level. 

Due  to  its  hybrid  nature,  modelling  the  shovel  swing  drive  effectively  requires  a 
means  for  incorporating  the  non-linear  saturation  effects  seen  in  the  motor  armature 
currents.  To  include  these  effects  into  the  prediction  models  a  sinusoidal  input  de¬ 
scribing  function  is  used  [6] .  The  describing  function  has  been  used  for  the  study  of 
limit  cycles  in  non-linear  dynamic  systems  [2,  3]  and  is  used  here  for  armature  cur¬ 
rent  saturation.  The  basic  idea  of  the  describing  function  approach  is  to  replace  each 
non-linear  element  in  a  dynamic  system  with  a  quasi-linear  descriptor  or  describing 
function  equivalent  gain  whose  magnitude  is  a  function  of  the  input  amplitude. 

The  drive  prediction  model  is  presented  as  continuous,  linear  state  space  systems 
with  the  form 

x  =  Ax  +  Bu  (1) 

The  input  vector  u ,  contains  the  reference  motor  speeds  generated  from  the  joy¬ 
stick  signals,  the  static  torque  load  on  the  motor  due  to  gravitational  effects  and  a 
coulomb  friction  disturbance  input.  The  state  vector  x,  contains  armature  current, 
the  motor  speed,  the  motor  position,  the  integrals  of  the  error  in  the  speed  and  cur- 
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Swing  PID  Speed  Control 


Swing  PI  Current  Control 


Fig.  1  Swing  drive  model  schematic 


rent  controllers  and  the  additional  state  of  swing  reference  armature  current  prior  to 
the  saturation  limit  .  This  state  arises  from  the  derivative  component  in  the  swing 
motor  speed  controller.  The  full  state  space  model  for  the  swing  drive  is  given  in 
equation  2  [6].  The  coulomb  friction  component  is  neglected  in  this  work. 
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3  Near-minimum  time  control  law 


The  state  space  model  for  the  swing  drive  is  used  to  design  a  control  loop  to  replace 
operator  joystick  input  with  the  aim  of  minimum  time  swing  motion  between  two 
points.  To  achieve  as  close  as  possible  to  minimum  time,  ’bang-bang’  control  action, 
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using  the  Pontryagin  minimum  principle  [5]  was  used  to  achieve  time  optimal  mo¬ 
tion  via  computation  of  an  optimal  switching  point.  This  principle  relies  on  the  fact 
that  solution,  using  on  the  maximum  and  minimum  extremes  of  control  input,  must 
be  time  optimal.  The  integration  of  the  state  space  model  forwards  in  time  includes 
the  effect  of  the  swing  drive  rate  limiting.  The  drive  is  rate  limited  to  250%/s,  such 
that  transitioning  between  input  extremes  (±100%)  takes  0.8s.  This  approach  also 
required  the  addition  of  a  terminal  controller  to  achieve  zero  steady  state  error.  The 
controller  is  presented  in  figure  2(a). 

The  switching  point  is  defined  as  the  moment  where  the  input  changes  between 
maximum  to  minimum  (or  vice  versa)  such  that  the  desired  state  is  reached  (zero 
position  error  and  velocity).  The  state  space  model  is  used  in  two  ways.  Firstly, 
prior  to  reaching  the  switching  point,  the  model  is  used  to  check  that  a  desired 
decelerating  swing  joystick  reference  (to  be  time  optimal,  this  reference  would  be 
90  -  100%)  would  bring  the  swing  drive  to  rest  prior  to  reaching  the  target  swing 
point  (figure  2(b)). 

Once  the  switching  point  is  reached,  the  state  space  model  is  then  used  to  com¬ 
pute  what  decelerating  swing  joystick  reference  is  required  to  bring  the  swing  drive 
to  rest.  The  solution  of  this  problem  is  obtained  through  the  bisection  method.  A 
joystick  reference  is  computed  and  applied  at  10Hz  to  ensure  real-time  behavior  of 
the  controller. 

As  the  capture  region  is  entered  (figure  2(b)),  the  state  space  model  predictive 
controller  is  augmented  by  the  feedback  capture  controller.  The  capture  controller 
is  added  to  account  for  any  uncertainties  that  exist  within  the  open  loop  state  space 
model  predictive  controller.  The  capture  controller  contains  terms  for  proportional, 
derivative  and  integral  control  with  various  anti- windup  strategies.  As  the  swing 
drive  reached  zero  velocity,  the  contributions  to  the  swing  drive  input  from  the 
model  predictive  controller  are  removed  and  the  capture  controller  was  left  to  main¬ 
tain  swing  angle. 

4  Demonstration  of  swing  control 

This  section  presents  the  results  of  the  application  of  the  prediction  models  for 
rope  shovel  trajectory  control  using  model  predictive  controller  alone.  For  the  swing 
drive,  we  use  the  a  model  of  the  plant  to  be  controlled  to  predict  the  future  behaviour 
of  the  plants  output.  This  feed-forward  characteristic  allows  for  future  demands  to 
be  incorporated  into  the  present  control  input,  enabling  control  action  to  bring  the 
shovel  to  rest  to  take  place  prior  to  attainment  of  the  goal  point.  Feedback  control 
alone  would  only  allow  for  control  action  to  occur  once  the  goal  point  is  attained 
while  not  at  rest.  The  model  is  propagated  forward  in  time  using  a  stiff  first  order 
implicit  integrator  with  a  time  interval  of  0.05  seconds. 

Here  we  present  the  results  of  requesting  a  90°  and  120°  change  in  swing  an¬ 
gle,  beginning  at  rest.  The  swing  angle  of  the  shovel  is  sensed  through  a  resolver 
attached  to  the  swing  drive  transmission,  with  known  gain,  and  the  swing  velocity 
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Model  Predictive  Controller 


Feed  back  (capture  controller) 

(a) 


Fig.  2  (a)  Swing  drive  controller,  (b)  Switch  point  determination. 


is  obtained  through  differentiation  of  the  swing  resolver  output  with  time.  Figure  3 
shows  change  in  swing  angle  against  the  change  in  swing  velocity. 

It  can  be  seen  that  the  desired  changes  in  swing  angle  are  faithfully  reproduced, 
however,  the  swing  drive  is  unable  to  come  to  a  complete  rest.  It  was  expected  that 
the  open  loop  nature  of  this  controller  would  struggle  to  bring  the  swing  drive  to  rest. 
During  the  initial  swing  cycle,  the  swing  drive  accelerates  under  a  constant  swing 
reference  of  50%  (these  trajectories  will  not  be  time  optimal)  until  the  switching 
point  is  reached.  This  choice  of  swing  reference,  however,  still  sees  the  machine 
reach  close  to  it’s  maximum  rotational  speed  of  \4A°/s  (see  figures  3  and  6(a)). 
As  the  input  switching  point  is  reached,  the  prediction  models  are  used  to  compute 
a  reference  that  will  bring  the  shovel  to  rest  at  the  desired  swing  angle  with  the 
reference  applied  once  it  is  computed.  It  can  be  clearly  seen  that  as  the  switching 
point  is  reached,  and  a  decelerating  reference  is  applied,  the  shovel  decelerates  faster 
than  expected.  This  conclusion  is  reached  because  after  the  initial  deceleartion,  the 
shovel  speed  decreases  its  deceleration  rate.  At  the  next  time  sample  the  model 
then  predicts  that  a  lesser  decelerating  reference  is  required  to  compensate.  In  the 
concluding  stages  of  the  swing  cycle,  the  effects  of  mechanical  backlash  and  the 
sensitivity  of  the  predictive  controller  joystick  reference  output  to  small  remaining 
swing  angles  results  in  the  oscillatory  behaviour  seen  in  figure  3. 


Fig.  3  Swing  angle  versus  swing  velocity  for  near-minimum  time  controller 
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4.1  Feedback  capture  control 


The  previous  section  demonstrated  that  the  use  of  a  predictive  controller  is  not 
sufficient  to  bring  the  swing  drive  to  rest.  As  indicated  in  the  overall  swing  drive 
controller  design  in  figure  2(a),  a  feedback  capture  controller  is  added.  This  con¬ 
troller  becomes  active  once  the  measured  swing  angle  enters  the  capture  region 
(figure  2(b)).  Proportional  (P,  Kp)  ,  Proportional  Derivative  (PD,  Kp,  Kj,  t)  and 
Proportional  Integral  (PI,  Kp,  Ki)  controllers  were  examined  for  their  suitability  to 
this  application. 

A  number  of  anti-windup  strategies  for  the  integrator  within  the  capture  con¬ 
troller  were  investigated  due  to  the  potential  for  the  control  input  to  saturate  (and 
’windup’  the  integrator)  because  of  the  limitiations  in  the  existing  control  system 
and  the  swing  drive.  The  discrete  strategies  investigated  were  reseting  the  integrator 
when  saturation  was  detected  (Reset),  and  holding  the  value  of  the  integrator  con¬ 
stant  when  saturation  occurs  (Hold-Reset).  A  feedback  anti- windup  strategy,  where 
the  difference  between  the  unsaturated  and  saturated  outputs  of  the  controller  (  zero 
when  there  is  no  saturation)  is  fed  back  to  the  integrator  to  ensure  that  the  integrator 
does  not  wind  up  was  also  used  (Inner-Feedback). 

The  capture  controller  performance  was  evaluated  on  the  research  shovel  from 
both  a  stationary  and  non- stationary  initial  condition.  The  resulting  swing  drive 
speeds  and  swing  angle  were  measured.  The  results  presented  in  figure  4  are  for 
step  inputs  of  positive  10°  from  a  stationary  position  and  a  variable  step  input  to 
bring  the  shovel  to  0°  from  a  non- stationary  position.  This  replicates  how  the  cap¬ 
ture  controller  would  have  to  behave  as  the  shovel  enters  the  capture  region  in  the 
overall  control  strategy.  Each  capture  test  was  done  independently  as  the  parameters 
governing  the  controllers  were  explored.  Table  2  summarizes  the  parameters  that 
were  used. 


Table  2  Capture  controller  parameters 


Condition 

Type 

Kp 

Ki 

Kd 

T 

Anti-windup 

1 

P 

200 

~ 

- 

- 

- 

2 

PI 

200 

20 

- 

- 

- 

3 

PI 

200 

20 

- 

- 

Reset 

4 

PI 

200 

20 

- 

- 

Hold-Reset 

5 

PI 

200 

20 

- 

- 

Inner-Feedback 

6 

PD 

200 

- 

50 

0.1 

- 

Figure  4(a)  plots  the  response  of  the  controllers  for  a  10°  change  in  angle  from  a 
stationary  initial  condition.  The  P  and  PD  capture  controllers  perform  best  by  min¬ 
imizing  the  overshoot  of  the  target  angle,  which  is  important  when  avoiding  colli¬ 
sions  with  load  devices,  but  are  limited  in  the  steady  state  error  by  the  mechanical 
backlash  present  in  the  machine  transmission.  The  backlash  has  been  determined 
previously  to  be  of  the  order  of  ±4°.  Adding  an  integral  term  added  additional  over- 
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0  (deg)  0  (deg) 

(a)  Capture  controller  from  stationary  position  (b)  Capture  controller  from  non- stationary  posi¬ 
tion 

Fig.  4  Capture  controller  test  results  on  research  shovel 


shoot  to  the  solution  profile  and  the  introducing  different  anti- windup  strategies  had 
negligible  effect  amongst  themselves  while  adding  more  oscilation.  Cases  1 ,  4  and  6 
(table  2)  were  also  investigated  for  a  non- stationary  initial  condition  to  more  closely 
resemble  how  it  would  operate  in  the  overall  swing  control.  The  results  are  given  in 
figure  4(b).  Here,  again,  the  P  and  PD  capture  controllers  performed  the  best. 

4.2  Near-minimum  time  controller  with  feedback  capture 

This  section  presents  the  results  of  the  application  of  the  prediction  models  for 
shovel  swing  trajectory  control  using  the  near-minimum  time  controller  and  its  cap¬ 
ture  with  feedback  control.  The  capture  controller  used  is  the  P  controller  (Case  1) 
detailed  in  table  2.  Figure  6(a)  shows  change  in  swing  angle  against  the  change  in 
swing  velocity  as  a  result  of  requesting  a  90°  and  1 80°  change  in  swing  angle,  be¬ 
ginning  and  ending  at  rest.  The  initial  behaviour  of  the  swing  speed  versus  swing 
angle  profile  is  highly  dependent  on  whether  or  not  the  transmission  was  perfectly 
meshed  when  started.  If  the  transmission  were  not  perfectly  meshed  it  is  possible 
for  swing  speed  to  increase  without  any  change  in  swing  angle  as  the  backlash  is 
taken  up  (see  90°  swing  angle  of  figure  6(a)). 

As  was  observed  in  Section  4,  the  swing  drive  accelerates  under  a  constant  swing 
reference  of  50%  until  the  switching  point  is  reached.  As  the  input  switching  point  is 
reached,  the  prediction  models’  joystick  references  are  applied  to  bring  the  shovel  to 
rest  at  the  desired  swing  angle.  After  the  switching  point  the  effects  of  unmodelled 
friction  are  again  observed.  Unlike  the  results  of  Section  4,  here,  when  combined 
with  the  capture  controller,  the  shovel  swing  drive  is  brought  to  rest  at  the  desired 
swing  location  within  mechanical  backlash  tolerances. 

Figure  5  plots  the  computed  swing  joystick  reference  profile  against  swing  angle 
for  both  cases  presented  in  figure  6(a).  Both  profiles  commence  with  an  initial  ref¬ 
erence  of  —50%  to  move  the  swing  angle  towards  the  target.  As  the  switching  point 
is  determined,  the  reference  become  positive  to  decelerate  the  swing  motion.  From 
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this  point  on  until  the  capture  region  is  encountered  (within  10°  of  the  target),  the 
joystick  reference  is  computed  by  the  state  space  prediction  model. 

There  is  clearly  large  variation  in  the  applied  joystick  reference  (±10%)  during 
the  decleration  stage  outside  of  the  capture  region.  The  reasons  for  this  are  twofold. 
Firstly,  the  sample  rate  of  10Hz,  chosen  to  maintain  realtime  behavior  while  solving 
the  bisection  problem,  appears  to  be  too  low.  However,  the  behaviour  of  the  shovel 
as  seen  in  figure  6(a)  indicates  that  the  overall  system  is  not  susceptible  to  these 
variations  in  control  input.  The  second  reason  lies  in  the  sensitivity  of  the  predicted 
swing  angle  at  zero  speed  to  the  applied  joystick  reference.  Figure  6(b)  shows  this 
behaviour.  Clearly,  when  operating  in  the  region  where  Js  >  50%,  a  20%  change 
in  joystick  reference  results  in  a  comparitively  small  stationary  swing  angle  value. 
This  behavior  can  be  understood  conceptually  by  understanding  that  the  joystick 
reference  is  essentially  a  torque  with  it  having  an  inverse  relationship  to  the  total 
angular  displacement  over  which  it  is  applied. 

As  the  capture  region  is  encountered,  the  capture  controller  output  is  added  to 
that  of  the  predictive  controller  (figure  5).  Initially,  the  capture  controller  acts  to 
reduce  the  decelerating  reference.  This  decrease  is  not  incorporated  in  the  predictive 
controllers  calculations  and  it  acts  to  increase  the  decelerating  reference  in  the  next 
prediction,  leading  to  the  initial  oscillitary  profile  in  joystick  reference.  Any  other 
spikes  in  decelerating  reference,  particularly  in  figure  5(a),  are  believed  to  be  due  to 
model  inaccuracies  but  this  does  not  make  it  unfit-for-purpose.  As  the  target  swing 
angle  is  reached  and  the  swing  drive  becomes  stationary,  the  predictive  controller 
input  is  stopped,  leaving  only  the  capture  controller  to  regulate  position. 


Fig.  5  Swing  joystick  reference  contoller  output 


5  Conclusions 

This  paper  has  presented  an  exploration  of  near-minimum  time  control  on  the  swing 
drive  of  an  electric  mining  shovel.  While  the  state  space  model  deals  only  with  the 
dominant  effects  of  the  hybrid  structure  of  the  base  machine  controller,  the  main 
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Fig.  6  (a)  Swing  angle  versus  swing  velocity  for  near-minimum  time  controller  and  capture  (b) 
Simulation  of  swing  angle  versus  swing  velocity  for  various  decelerating  references 


outcome  of  the  paper  has  been  to  show  that  it  is  possible  to  achieve  positional  con¬ 
trol  within  a  predictive  control  framework.  The  single-input  single-output  structure 
of  the  controller  limits  its  application  for  the  broader  automation  problem  which 
requires  coordinated  motion  of  all  degrees  of  freedom.  However,  this  work  has  es¬ 
tablished  that  a  fit-for-purpose  control  structure  can  be  implemented. 
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The  Development  of  a  Telerobotic  Rock  Breaker 


Elliot  Duff,  Con  Caris,  Adrian  Bonchis,  Ken  Taylor,  Chris  Gunn  and  Matt  Adcock 


Abstract  This  paper  describes  the  development  of  a  tele-robotic  rock  breaker  de¬ 
ployed  at  a  mine  over  lOOOkms  from  the  remote  operations  centre.  This  distance  in¬ 
troduces  a  number  of  technical  and  cognitive  challenges  to  the  design  of  the  system, 
which  have  been  addressed  with  the  development  of  shared  autonomy  in  the  con¬ 
trol  system  and  a  mixed  reality  user  interface.  A  number  of  trials  were  conducted, 
culminating  in  a  production  field  trial,  which  demonstrated  that  the  system  is  safe, 
productive  (sometimes  faster)  and  integrates  seamlessly  with  mine  operations. 


1  Introduction 


A  rockbreaker  consists  of  a  large  serial  link  manipulator  arm  having  4  DOF  that  is 
fitted  with  a  hydraulic  hammer  and  used  throughout  the  mining  industry  to  break 
oversized  rocks.  CSIRO  has  been  contracted  by  Rio  Tinto  Iron  Ore  to  install  a  tele- 
robotic  control  system  to  the  primary  rockbreaker  at  the  West  Angelas  mine,  situated 
over  1000km  north-east  of  Perth  in  Western  Australia.  Figure  la  shows  the  rock¬ 
breaker  installation  at  the  ROM  (Run  of  Mine)  bin.  The  bin  is  fitted  with  horizontal 
bars  at  the  bottom  that  prevent  oversized  rocks  from  entering  the  crusher  below  (see 
Figure  lb).  This  arrangement  is  commonly  referred  to  as  a  grizzly.  Haul  trucks  car¬ 
rying  ore  from  a  nearby  quarry  dump  their  load  into  the  ROM  bin.  Any  oversize 
rocks  in  the  bin  are  crushed  using  the  rockbreaker  arm.  Until  now,  an  operator  has 
had  to  step  out  of  a  control  room  adjacent  to  the  bin  and  use  a  line-of- sight  remote 
control  to  operate  the  arm.  The  rock  breaking  strategy  in  this  context  is  determined 
from  a  quick  visual  examination  of  the  rock  (i.e.  centre  hit  to  shatter  the  rock,  nibble 
the  sides  of  the  rock,  or  nudge  the  rock  to  let  it  fall  through).  The  available  time  is 
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limited  by  the  level  of  ore  in  the  hoper  below  the  grizzly  and  the  number  of  trucks 
queueing  at  the  bin  (typically  less  than  a  minute).  If  the  rock  cannot  be  broken  in 
time,  it  must  be  pushed  to  the  side  and  the  arm  returned  to  its  rest  position. 

The  objective  of  this  project  was  to  demonstrate  the  feasibility  of  effective  and 
safe  telerobotic  control  over  long  distances,  as  part  of  RioTinto’s  long  term  plan 
to  tele-operate  their  mining  operations.  This  distance  introduces  a  number  of  tech¬ 
nical  (communications  bandwidth  and  latency)  and  cognitive  (lack  of  spatial  and 
situational  awareness)  challenges  that  can  be  addressed  by  developing  technologies 
at  both  the  local  and  remote  ends  of  the  system.  Improving  the  intelligence  of  the 
control  system  at  the  remote  end  (i.e.  Cartesian  motion,  collision  avoidance)  can 
mitigate  the  effects  of  latency,  whilst  the  development  of  mixed  reality  user  inter¬ 
faces  (with  a  combination  of  live  video  and  3D  computer  visualization)  can  improve 
the  situational  awareness  for  the  operator. 


Fig.  1  Rock  breaker  (a)  at  rest  over  ROM  bin  (b)  breaking  a  rock  on  grizzly. 


Mixed  reality  arises  as  a  hybrid  solution  that  attempts  to  overcome  the  weak¬ 
nesses  of  the  two  extremes:  direct  visualisation  of  the  environment  by  cameras 
and  synthetic  visualisation  of  a  software  model  of  the  environment  by  computer 
graphics.  Cameras  provide  a  direct  representation  of  the  real  world  which  includes 
all  visible  features,  but  typically  only  from  a  limited  range  of  viewpoints.  Virtual 
representations  are  very  flexible  with  respect  to  viewing  parameters  and  manipula¬ 
tion,  but  can  only  include  information  that  is  represented  in  the  software  model  of 
the  environment,  and  only  to  the  limit  of  accuracy  to  which  the  dynamics  of  the  real 
environment  can  be  sensed.  Thus,  an  interface  that  mixes  the  two  paradigms  for  vi¬ 
sualisation  can  take  advantage  of  the  best  features  of  each,  while  overcoming  some 
of  the  disadvantages.  In  practice,  both  paradigms  provide  situational  awareness  me¬ 
diated  by  technology,  and  so  are  subject  to  failures  of  various  sorts.  Providing  mul¬ 
tiple  streams  of  awareness  incorporates  a  level  of  redundancy  that  protects  against 
some  failure  modes. 
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This  paper  is  divided  as  follows:  Section  2  provides  a  brief  background  to  tele¬ 
operation  over  the  Internet  and  its  applications  to  the  mining  domain;  Section  3 
describes  the  implementation  of  the  tele-robotics  system  to  a  mining  rock  breaker 
in  a  production  environmnet;  Section  4  describes  the  results  of  the  field  trials;  and 
Section  5,  concludes  with  a  discussion  and  proposal  for  future  work. 


2  Background 

Tele-operation  has  been  an  active  field  of  research  and  commercial  activity  for  a 
number  of  years  as  it  offers  a  means  of  isolating  an  operator  from  hazardous  or  un¬ 
inhabitable  environments  while  retaining  the  reasoning  powers  of  the  human  opera¬ 
tor.  It  has  a  long  history,  dating  back  over  sixty  years  to  a  “master- slave”  system[5]. 
Many  systems  have  subsequently  been  developed  for  underwater,  radioactive,  vol¬ 
canic,  and  outer  space  environments. 

During  the  late  1990s  there  was  a  great  deal  of  interest  in  tele-robotics  appli¬ 
cations  over  the  Internet  [7].  One  of  the  first  Web-based  tele-operation  projects[6] 
involved  a  mock-up  of  an  archaeological  site  situated  in  a  radioactive  area.  Users 
could  join  a  queue  to  take  control  of  a  2.5DOF  manipulator,  searching  for  ‘relics’. 
This  work  later  evolved  into  a  “tele-garden”’  system  in  which  users  could  remotely 
tend  to  a  garden.  To  avoid  latency  induced  instability  both  systems  used  supervisory 
control  to  specify  a  position  in  space[14].  Around  the  same  time,  researchers  de¬ 
veloped  a  Web-controllable,  6-axis  robot  that  allowed  operators  to  stack  toy  blocks 
by  controlling  the  gripper  and  Cartesian  position  of  the  arm  (again  using  supervi¬ 
sory  control)  [16].  This  system  was  subsequently  converted  to  a  ‘tele-laboratory’ 
allowing  students  to  perform  parts  of  their  coursework. 

With  respect  to  mining,  Ballantyne  et  al.  [1,  8]  investigated  the  use  of  virtual 
reality  displays  for  excavator  tele-operation.  The  display  enabled  the  operator  to 
pre-set  no-go  areas  for  the  excavator  and  to  also  mark  areas  of  the  excavation  site  in 
which  the  terrain  was  perhaps  too  dangerous  for  the  excavator  to  navigate.  Several 
Japanese  groups  have  also  been  investigating  tele-operation  of  mining  and  construc¬ 
tion  equipment  [15,  10,  9].  Although  research  was  conducted  to  develop  a  virtual 
reality  system  for  the  mining  industry  [2]  this  technology  was  never  commercially 
realized.  The  reasons  for  this  failure  is  unknown,  but  at  this  time,  the  technology 
was  immature  and  probably  did  not  provide  the  appropriate  level  of  immersion  and 
interaction  necessary  for  control  (i.e.  due  to  latency  and  bandwidth  issues). 

Advances  in  our  ability  to  develop  autonomous  systems  have  extended  the  pos¬ 
sibilities  for  very  high-level  task  specification,  moving  tele-operators  from  manual 
control  to  a  role  which  is  much  more  tactical  or  supervisory  [12].  These  layers  of  au¬ 
tonomy  introduce  different  requirements  for  the  human  machine  interface  [4] .  One 
of  the  main  criticisms  of  tele-robotics  is  that  it  does  not  provide  sufficient  situational 
awareness  [13]  to  the  human  operator  to  sustain  the  previous  manual  levels  of  pro¬ 
duction.  This  is  being  addressed  by  a  number  of  research  labs  across  the  world  (eg. 
NASA’s  Robonaut)  and  has  become  a  significant  research  activity  within  CSIRO. 
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3  Implementation 

The  rockbreaker  system  architecture  (see  Figure  2)  is  based  around  two  main  com¬ 
ponents  :  the  Remote  Control  System  (RCS)  located  at  the  West  Angelas  mine  and 

the  Telerobotic  User  Interface  (TUI)  more  than  lOOOkms  from  the  Remote  Opera¬ 
tions  Centre  (ROC)  in  Perth,  Western  Australia.  The  RCS  includes: 

•  CANBus  tilt  sensors  fitted  to  the  boom,  jib  and  hammer  and  absolute  encoder 
fitted  to  the  slew  axis  (see  Figure  3b); 

•  two  analogue  PTZ  cameras  and  a  fixed  wide  angle  camera  mounted  diagonally 
across  ROM  bin,  connected  to  three  high  speed  video  compression  units; 

•  two  pairs  of  Firewire  megapixel  stereo  cameras,  mounted  80cm  apart  and  several 
metres  above  the  ROM  bin  (see  Figure  3a); 

•  industrial  Ethernet  I/O  to  generate  voltages  to  drive  solenoids  and  measure  state 
of  hydraulic  control  pack; 

•  a  safety  PLC  that  monitors  the  access  in  the  rockbreaker  workspace,  and  actuates 
safety  relays  that  provide  power  to  the  control  unit; 

•  a  Linux  PC  to  run  control  software  and  an  XP  machine  to  run  stereo  acquistion 
software.  Both  machine  were  connected  to  the  mine  site  Intranet. 


Fig.  2  System  architecture  showing  components  at  the  (a)  remote  and  (b)  local  locations. 


The  control  software  is  based  upon  our  DDX  (Dynamic  Data  eXchange)  mid¬ 
dleware  [3].  It  is  split  into  a  number  of  specialized  modules.  At  the  top  is  a  com¬ 
munications  layer  that  provides  a  simple  web  interface  to  the  controller,  and  UDP 
communications  for  outgoing  state  information  and  incoming  demands.  The  advan¬ 
tage  of  UDP  is  that  it  provides  the  lowest  communications  latency  (which  is  an 
important  consideration  in  this  application)  at  the  expense  of  reliability.  At  the  next 
layer  is  the  trajectory  planner,  which  accepts  the  incoming  demands  and  plans  the 
arms  trajectory  (i.e.  it  is  able  to  convert  Cartesian  demands  into  a  sequence  of  joint 
space  velocity  demands).  Below  this  is  the  boom  controller  which  has  PID  loops 
on  four  joints  and  is  able  to  detect  and  alert  the  operator  to  the  joint  limits.  At  the 
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bottom  is  the  boom  server  which  sends  the  control  signal  to  the  Ethernet  10,  which 
in  turn  generates  the  control  voltages  for  the  proportional  directional  control  valves 
(solenoids)  at  the  base  of  the  rock  breaker.  These  solenoids  cannot  be  actuated  by 
the  computer  without  explicit  control  from  the  Safety  PLC.  Such  control  will  not 
be  given  unless  a  number  of  fail-safe  steps  are  taken,  including;  latching  the  access 
gates,  a  heartbeat  from  the  control  computer,  and  selecting  “Computer  Enable”  in 
the  control  room.  The  Safety  PLC  also  provides  access  into  the  site  Citect  system 
(which  controls  the  crusher). 


Fig.  3  Installed  hardware:  (a)  cameras  (b)  tilt  sensors  and  (c)  calibration  targets  in  ROM  bin. 


Particular  care  was  taken  to  select  hardware  that  could  survive  in  the  harsh  min¬ 
ing  environment.  In  summer,  the  ambient  temperature  can  exceed  50  deg  C,  and 
drop  below  zero  at  night.  The  iron  ore  dust  is  particularly  abrasive  and  can  easily 
damage  electronics.  Since  the  arm  dimensions  were  known,  it  was  possible  to  use 
the  estimated  position  of  the  hammer  tip  itself  to  measure  the  dimensions  of  the 
ROM  bin  (which  were  different  from  the  mine  plan).  These  dimensions  were  then 
used  to  place  visual  markers  (see  Figure  3c)  in  the  ROM  bin  that  were  then  used  to 
calibrate  the  seven  cameras.  This  meant  that  the  arm,  cameras  and  ROM  bin  were  all 
measured  in  the  same  frame  of  reference.  This  frame  of  reference  was  used  by  the 
collision  detection  module  (using  openGL)  to  detect  collisions  between  the  model 
of  the  ROM  bin  and  the  arm. 

Nodding  lasers  were  initially  proposed  to  acquire  the  3D  surface  of  the  rocks  in 
the  ROM  bin,  but  after  discussions  with  the  operators,  we  found  that  they  rely  heav¬ 
ily  on  the  texture  and  colour  of  the  rocks  when  deciding  upon  a  breaking  strategy. 
A  second  computer  was  used  subsequently  to  acquire  and  process  high  resolution 
stereo  images.  A  3D  surface  was  generated  from  advanced  photogrammetry  tech¬ 
niques  (commercial  product  developed  by  CSIRO  called  Sirovison).  To  reduce  the 
effects  of  stereo  shadow  a  second  pair  of  cameras  was  mounted  diagonally  across 
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the  ROM  bin.  To  cope  with  the  extreme  lighting  conditions  (dark  shadows  across  the 
bin  in  the  morning  and  evening)  contrast  enhancement  was  achieved  with  exposures 
bracketing.  In  practice,  the  system  is  able  to  generate  3D  surfaces  with  cm  resolu¬ 
tion.  Once  the  3D  surface  has  been  acquired  it  is  converted  to  X3D  and  sent  to  the 
TUI  for  rendering  onto  the  3D  virtual  screen  (see  Figure  4).  One  future  advantage  of 
using  this  photogrammetry  software  is  that  it  also  has  the  ability  to  recognize  joint 
sets  and  fracture  surfaces  -  a  valuable  feature  for  future  automation. 


Fig.  4  Three  video  screens  and  augmented  virtuality  of  rock  breaker  overlaid  with  3D  rock  surface. 


One  of  the  shortcomings  of  the  previous  work  has  been  the  ad-hoc  nature  in 
which  the  user  interfaces  have  been  developed.  Whilst  components  based  robotics 
(such  as  Player/Stage,  ORCA  and  Microsoft  Robotic  Studio)  and  Web-based  tool¬ 
boxes  for  Lab  View  and  Matlab  have  moved  to  the  mainstream,  the  user  interfaces 
have  not  provided  the  level  of  immersion  necessary  to  provide  sufficient  situational 
awareness  to  control  dangerous  and  expensive  equipment  in  remote  and  unstruc¬ 
tured  environments.  Some  researchers  have  proposed  a  framework  based  upon  gam¬ 
ing  technology  [11]  which  we  have  (i.e.  using  Second  Life  to  control  a  simulation  of 
the  rock  breaker)  and  will  continue  to  look  into  (i.e.  Unity).  The  proposed  solution 
for  this  project  has  been  to  use  AJAX3D  which  merges  X3D  and  AJAX  techniques. 
X3D  is  an  ISO  standard  for  real-time  computer  graphics  that  can  be  viewed  with 
the  appropriate  viewer  (eg.  Flux  from  Mediamachines).  Multimedia  streams  can 
be  placed  onto  surfaces  in  the  environment  (e.g.  video  onto  billboards).  The  X3D 
viewer  can  support  audio,  stereo  displays  and  haptic  devices.  Being  an  open  stan¬ 
dard  there  are  a  number  of  CAD  systems  that  are  able  to  export  drawings  to  X3D. 
With  AJAX  calls  to  the  DOM  (Document  Object  Module)  or  SAI  (Scene  Authoring 
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Interface)  it  is  possible  to  partially  load  parts  of  the  scene  (i.e.  update  the  world)  or 
reload  features  of  the  scene  (i.e.  a  joint  angle  on  the  rock  breaker). 

A  photograph  of  the  TUI  in  operation  at  the  ROC  is  shown  in  Figure  2b.  In  front 
of  the  operator  there  are  two  screens:  on  the  right  is  the  Citect  system  that  is  used 
to  monitor  the  movement  of  the  ore  from  the  crusher  down  to  the  stacker/reclaimer; 
on  the  left  is  the  user  interface  designed  for  the  rock  breaker  control.  It  consists  of 
the  four  windows  (see  Figure  4):  three  video  screens  and  an  augmented  virtuality  (a 
3D  computer  graphics  scene  that  includes  some  elements  captured  directly  from  the 
real  world).  1  The  operator  is  able  to  control  the  rock  breaker  with  the  mouse  and 
the  gaming  joystick  (Logitec  RumblePad).  Projected  onto  a  screen  above  the  two 
monitors  are  four  video  streams  from  various  locations  around  the  rock  breaker: 
one  to  monitor  approaching  trucks  and  another  to  monitor  the  state  of  the  secondary 
crusher.  Speakers  on  either  side  of  the  screen  reproduce  the  sounds  made  at  the  rock- 
breaker.  This  is  a  very  important  indicator  of  the  state  of  the  machine:  the  operator 
is  able  to  hear  the  sirens  that  indicate  that  the  machine  is  powered;  and  the  sound  of 
the  hammer  can  also  be  used  into  indicate  whether  the  hammer  is  making  contact 
with  the  rocks  (i.e.  dry  firing).  Access  to  the  mine  communication  systems  (RF  ra¬ 
dio)  is  provided  via  a  microphone/headset.  With  this  the  operator  is  able  to  inform 
the  fleet  management  system  of  the  availability  of  the  crusher. 

Once  the  user  has  established  network  communications  with  the  RCS,  most  com¬ 
mands  are  accepted  through  the  RumblePad.  The  right  hand  trigger  button  is  used 
as  a  command  validation  switch  (deadman).  The  movement  of  the  arm  being  dis¬ 
abled  when  the  switch  is  off.  The  left  hand  button  over-rides  the  collision  detection 
system  to  allow  the  operator  to  move  the  arm  close  to  the  ROM  bin.  This  is  typ¬ 
ically  used  for  the  cleanup  operations.  When  hydraulic  pressure  is  requested,  the 
RCS  expects  a  heartbeat  at  10Hz.  The  RCS  will  disable  hydraulics  after  a  specified 
number  of  heartbeats  are  missed  (in  this  case,  20  heartbeats  or  2  seconds).2  In  the 
TUI,  the  operator  is  able  to  select  different  modes  of  control.  They  can  select  in 
either  velocity  or  position  mode  in  joint,  Cartesian  or  backhoe  space  (a  combination 
of  joint  and  Cartesian).  The  two  joystick  controls  on  either  side  of  RumblePad  are 
used  to  control  the  motion  of  the  arm.  The  arm  can  also  be  sent  to  pre-configured 
set  points  (i.e.  Home,  Park  etc.)  or  requested  to  move  to  a  selected  location  on  the 
3D  rock  surface.  The  user  interface  is  designed  around  the  principal  that  there  is 
only  one  primary  view  that  determines  the  behaviour  of  the  controls.  For  example, 
when  the  PTZ  video  is  the  primary  display,  then  the  arrow  buttons  (top  left  of  the 
RumblePad)  are  able  to  pan  and  tilt  the  camera.  However  if  the  primary  screen  is 
the  virtual  screen,  the  arrow  keys  move  you  up  and  over  the  ROM  bin  (eg.  jet-pack 
mode).  The  selection  of  primary  screen  is  controlled  by  the  numbered  buttons. 


1  The  system  was  designed  so  that  the  layout  and  size  of  the  screens  could  be  modified  upon 
request.  In  the  trial,  rather  than  the  original  design  of  a  primary  screen  with  three  thumbnails,  the 
operator  preferred  four  screens  of  equal  size  with  minimal  decoration. 

2  Measured  over  several  days,  the  average  round  trip  time  was  56  milliseconds  with  a  standard 
deviation  of  3  milliseconds.  On  average,  there  were  3  pings  per  day  that  lasted  more  than  300  mil¬ 
liseconds.  The  video  compression  (IndigoVision)  ranged  in  bandwidth  from  339  Kbps  to  1238Kbps 
at  25fps  4SIF.  The  state  and  demand  UDP  traffic  consumed  little  bandwidth. 
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4  Field  Trials 

Field  trials  were  conducted  in  mid  December  2009.  The  trial  consisted  of  three  12- 
hour  shifts  over  3  days  during  normal  production  runs.  Two  of  the  shifts  started  at 
4am  to  allow  for  testing  in  night  conditions.  We  used  two  operators,  one  at  each 
end  of  the  system.  The  operator  based  at  the  ROC  in  Perth  had  not  been  trained 
or  introduced  to  the  TUI  prior  to  commencing  the  trials.  A  second  operator  was 
present  in  the  control  room  at  the  mine  site  to  supervise  the  rockbreaker  operation 
and  intervene  in  case  of  emergency. 

During  the  field  trials  we  were  able  to  remotely  replicate  the  work  flow  of  the 
local  human  operator.  When  the  operator  is  alerted  to  the  presence  of  a  large  rock, 
the  operator  is  presented  with  an  overview  of  the  rockbreaker  from  a  wide-angle 
video  stream  and  augmented  virtuality  (see  Figure  4).  The  remote  operator  is  able  to 
“walk  around  the  rockbreaker  to  inspect  the  rocks  from  different  angles.  Once  they 
have  established  the  appropriate  breaking  strategy,  the  operator  is  able  to  deploy  the 
arm  with  the  joystick.  As  the  arm  is  commanded  to  move,  the  motion  of  the  arm 
is  replicated  in  the  3D  scene.  Simultaneously  both  PTZ  cameras  follow  the  tip  of 
the  hammer.  When  the  operator  is  ready  to  break  the  rock,  they  can  switch  their 
attention  to  the  live  video  stream,  which  they  can  use  to  monitor  the  breaking  of  the 
rock.  Once  complete,  the  arm  can  be  automatically  sent  to  the  rest  position. 

Within  half  an  hour  of  introducing  him  to  the  TUI,  the  operator  based  at  the  ROC 
in  Perth,  was  breaking  rocks.  At  first,  the  operator  was  unsure/sceptical  of  the  game 
like  controls,  however  after  some  experience  with  the  new  interface  they  were  happy 
to  accept  the  device.  At  the  end  of  the  trial  the  operator  expressed  the  opinion  that 
the  deployment  of  the  arm  was  faster  for  breaking  “simple”  rock  configurations,  but 
difficult  to  deal  with  complex  rock  configurations  (where  rocks  are  packed  on  top  of 
one  another).  The  operator  made  several  useful  suggestions  for  changes  to  the  user 
interface  that  would  address  this  problem  (i.e  faster  3D  update  of  rock  profile,  and 
manual  over  ride  of  zoom  control).  Over  the  three  days  the  operator  was  able  to  deal 
with  all  of  the  rocks  without  measurable  disruption  to  production.  However  there 
were  two  safety  incidents.  In  one  case,  there  was  a  communications  dropout,  and 
in  a  second  incident  the  operator  moved  the  arm  into  the  wall  as  a  result  of  using 
a  forth  camera  mounted  on  a  hill  over  200m  away  (this  camera  was  not  part  of  the 
rockbreaker  system)  and  became  confused  with  motion  of  the  arm.  At  no  time  was 
any  damage  done  to  the  arm  or  the  ROM  bin. 


5  Discussion  and  Future  Work 

Perhaps  the  most  significant  comment  made  by  the  operator  was  when  we  asked 
what  was  the  difference  between  local  and  remote  operations.  His  reply  was  ‘“In 
local  operations,  I  can  concentrate  on  the  task  at  hand,  and  my  peripheral  senses 
deal  with  everything  else.  When  I’m  remote,  I’m  forced  to  redirect  my  attention  to 
each  screen/window  that  is  front  of  me.  This  distracts  me  from  what  I  am  doing.”  . 
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The  current  generation  of  control  room  used  in  the  mining  industry  contains  a 
number  of  custom  built  user  interfaces:  typically  one  for  each  mining  process  that 
needs  to  be  monitored.  To  reduce  the  cognitive  load  of  switching  from  one  interface 
to  another,  we  believe  that  the  operator  should  be  presented  with  a  single  interface. 
This  interface  should  be  highly  immersive,  interactive  and  reconfigurable. 

The  primary  goal  of  our  future  work  is  the  development  and  delivery  of  a  Mixed 
Reality  Framework  that  will  provide  a  unified  user  interface  for  accessing  and  in¬ 
teracting  with  key  areas  of  the  mining  operation.  The  intention  of  this  work  will 
not  be  to  replace  the  existing  remote  control  systems  but  to  provide  a  portal  to 
access  various  third-party  systems  via  the  Mixed  Reality  Framework.  To  use  the 
web-browser  metaphor,  which  is  used  to  move  from  one  web  page  to  another,  this 
system  will  provide  a  Mixed  Reality  Browser  that  will  enable  personnel  to  browse 
the  state  of  various  mining  processes  in  a  3D  context  in  conjunction  with  existing 
2D  interfaces. 

In  the  context  of  teleoperation,  “mixed  reality”  can  be  used  to  refer  to  interfaces 
that  mix  the  different  pathways  to  visualisation  -  direct  visualisation  via  video  and 
synthetic  visualisation  derived  from  a  dynamic  software  model  of  the  state  of  the 
world.  Further,  for  this  mixing  to  be  effective,  it  must  be  based  on  information  about 
the  relationship  between  the  pathways.  In  particular,  the  cameras  themselves  (and 
mobile  camera  platforms)  must  be  modeled  in  some  way,  and  the  camera  models 
may  also  be  dynamically  updated  from  sensor  information.  We  refer  to  the  combi¬ 
nation  of  all  these  models,  and  the  relationships  between  them,  as  the  composite 
situation  model.  Several  things  need  to  be  modelled  in  software  as  part  of  the  pro¬ 
cess  of  situation  representation.  These  models  may  exist  on  the  same  computer  as 
the  user  interface,  or  they  may  be  accessed  from  a  centralised  world  model  ’in  the 
cloud’ . 

Situation  awareness  through  visualisation  is  only  one  aspect  of  teleoperation.  The 
actual  operation  and  control  of  the  remote  machine  must  also  be  supported  through 
the  interface.  A  mixed  reality  interface  creates  opportunities  for  control  paradigms 
based  on  direct  selection  and  manipulation  of  objects  within  the  interface.  This  in¬ 
cludes  real  objects  that  have  been  visualised  directly  or  synthetically,  or  virtual  ob¬ 
jects  that  have  been  added  explicitly  for  the  purpose  of  interaction.  In  each  case, 
the  same  selection  and  manipulation  techniques  could  be  used.  Just  as  a  mixed  re¬ 
ality  interface  incorporates  a  mixture  of  pathways  for  situation  visualisation,  it  will 
also  incorporate  a  mixture  of  operation  pathways:  direct  operation  (for  example,  the 
position  of  a  joystick  controls  the  degree  of  opening  of  a  valve),  and  indirect  or 
synthetic  operation  (for  example,  a  virtual  version  of  the  arm  is  moved  to  a  posi¬ 
tion,  then  commands  are  sent  to  move  the  real  arm  to  that  position).  Mixed  Reality 
Interaction  represents  a  fertile  area  for  investigation. 
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Abstract  A  method  is  developed  that  improves  the  accuracy  of  super-resolution 
range  maps  over  interpolation  by  fusing  actively  illuminated  HDR  camera  im¬ 
agery  with  LIDAR  data  in  dark  subterranean  environments.  The  key  approach  is 
shape  recovery  from  estimation  of  the  illumination  function  and  integration  in  a 
Markov  Random  Field  (MRF)  framework.  A  virtual  reconstruction  using  data  col¬ 
lected  from  the  Bruceton  Research  Mine  is  presented. 


1  Introduction 

Mine  accidents  including  those  at  Quecreek,  Sago  and  Crandall  Canyon  highlight 
the  urgency  of  estimating  accurate  3D  geometry  in  mines.  Systems  have  been  em¬ 
ployed  to  map  mines,  from  virtual  reality  systems  for  training  rescue  personnel  [1] 
to  automated  survey  robots  and  post  accident  investigation  [2].  While  many  of 
these  systems  use  state-of-the-art  direct  range  measurement  sensors,  LIDAR  sen¬ 
sors  alone  cannot  meet  the  resolution,  size,  power  or  speed  requirements  to  pro¬ 
duce  quality  mine  maps  in  a  practical  amount  of  time. 

This  research  combines  absolute  range  sensor  data  with  high-resolution  CCD 
imagery  in  a  novel  manner  to  achieve  a  quantitative  increase  in  range  data  accu¬ 
racy  and  density.  In  particular,  the  method  targets  application  in  artificial  subter¬ 
ranean  voids  where  assumptions  can  be  used  to  constrain  the  image  formulation 
problem.  As  both  color  and  geometric  information  are  of  interest,  cameras  and 
range  sensors  commonly  exist  on  modeling  platforms  [2].  Integration  of  the 
method  presented  here  requires  only  calibration  and  low  processing  overhead. 

The  results  from  field  experimentation  in  a  working  mine  are  discussed  in  de¬ 
tail.  A  dense  visualization  technique  enabling  mesh  quality  models  to  be  displayed 
and  updated  in  real-time  on  GPU  hardware  is  explored.  Lastly,  a  generalization  of 
the  method  to  similar  domains  in  field  robotics  is  made. 
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2  Prior  Work 

The  fusion  of  range  and  imaging  sensors  to  improve  3D  model  quality  has  been 
studied  in  depth  [3, 4, 5, 6].  A  general  model  for  fusing  raw  LIDAR  and  image  data 
into  super-resolution  range  images  using  a  Markov  Random  Field  (MRF)  was  ex¬ 
plored  in  Diebel  and  Thrun’s  seminal  paper  [4].  MRFs  are  undirected  graphs  that 
represent  dependencies  between  random  variables  and  have  been  used  extensively 
in  computer  vision  for  noise  removal,  feature  matching,  segmentation  and  inpaint¬ 
ing  (see  [3]).  The  popularity  of  the  MRF  stems  from  the  ability  to  model  complex 
processes  using  only  a  specification  of  local  interactions,  the  regular  grid  nature  of 
CCD  images  and  the  maximum  a  posteriori  (MAP)  solution  requiring  only  direct 
convex  optimization  in  many  cases. 

Diebel  and  Thrun  surmised  that  higher  resolution  intensity  (color)  data  could  be 
used  to  texture  range  images  and  increase  the  range  accuracy  of  interpolated 
points.  The  results  in  a  uniformly  and  sufficiently  illuminated  regular  office  envi¬ 
ronment  are  quite  compelling.  Cameras  are  able  to  turn  LIDAR  scans  into  dense 
range  images  with  very  low  computational  overhead.  However,  the  assumption 
that  an  image  provides  relative  range  information,  even  locally,  is  tenuous  in  un¬ 
structured  environments.  Generating  3D  geometry  from  a  general  2D  projection  is 
an  ill-posed  problem.  The  ability  of  Diebel’ s  method  to  smooth  point  clouds  using 
areas  of  flat  image  information  was  convincingly  shown,  but  the  converse  of  en¬ 
hancing  a  point  cloud  using  image  texture  was  not.  Recent  research  in 
range/camera  fusion  using  MRFs  include  [5,6];  all  of  which  also  target  indoor  ap¬ 
plication. 

This  research  extends  MRF-based  super-resolution  to  subterranean  environ¬ 
ments  such  as  mines,  caves,  lava  tubes  and  sanitary  pipes.  These  environments 
have  unknown  but  slowly  varying  albedos  with  a  dominant  diffuse  reflectance 
term.  These  naturally- dark,  enclosed  spaces  also  require  active  illumination  to  im¬ 
age,  enabling  the  use  of  calibrated  lighting.  With  these  assumptions  we  are  able  to 
provide  a  stronger  depth  estimate  for  texturing  the  interpolated  LIDAR  data. 


3  Markov  Random  Field  Framework 

A  range  image  is  used  as  the  common  representation  for  fusion.  The  3D  range 
cloud  data  is  registered  to  the  pinhole  of  the  camera,  forming  a  range  map  (R)  via 
projection  of  distances  onto  the  nxm  image  plane  at  equivalent  resolution.  Many 
pixels  in  the  range  map  will  not  contain  range  measurements;  these  holes  are  filled 
from  nearby  data  through  bilinear  or  nearest  neighbor  interpolation.  The  color  im¬ 
age  data  can  be  then  converted  to  intensity  values  or  used  as  a  raw  RGB  vector 
( I ).  A  lattice  MRF  is  formed  where  there  is  a  single  range  and  intensity  meas¬ 
urement  associated  with  each  node.  We  propose  an  MRF  fusion  method  similar  to 
that  documented  in  [4]  that  numerically  integrates  the  image  gradient. 
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The  range  map  potential  (3.1)  promotes  agreement  between  the  estimated  vari¬ 
ables  and  the  interpolated  range  data.  The  smoothness  prior  (3.2)  regularizes  large 
changes  in  the  range  estimate  and  like  the  image  potential  (3.3)  connects  potential 
transfer  from  a  node  to  its  neighbors. 


(3.1) 

i&L 

^=/?Z  Z  (x,--x,  y 

(3.2) 

i&L  jeN(i) 

=  «Z  Z  (^+V/,y-x,.)2 

(3.3) 

isL  jeN(i) 

a  =  w2  exp(-c-  <t) 

/?  =  vv3  (l-exp(-c<r)) 

(3.4) 

The  image  gradient  is  a  reasonable  predictor  of  depth  change  across  neighbor¬ 
ing  pixels.  However,  integrating  the  gradient  to  produce  depths  over  a  large  local¬ 
ity  is  prone  to  drastic  shape  distortions.  The  range  estimate  can  be  used  to  regular¬ 
ize  numerical  integration  of  the  intensity  gradient.  The  weights  a  and  j3  are 
relatively  scaled  by  an  interpolation  distance  uncertainty  ( a )  for  some  weights 
w2  and  w3  (3.4).  a  can  be  generated  from  the  range  image  during  inpainting  by 
using  the  Matlab  command  BWDIST,  for  example. 

p(xl^,/,<7)  =  iexp(-|(T  +  Q  +  0))  (3.5) 

xMAP  =  arg  min ,  /('P  +  <1>  +  £2)  (3.6) 

Solving  for  the  MAP  of  the  distribution  requires  running  a  gradient  descent  al¬ 
gorithm  on  the  target  variables  x  in  3. 5-3. 6,  where  Z  is  the  partition  function  [4]. 


4  Structure  from  Shading 

The  image  gradient  VT.  in  (3.3)  can  apply  to  either  raw  pixel  data  or  better  esti¬ 
mates  of  depth  from  the  camera.  As  scene  geometry  cannot  be  ascertained  from  a 
single  image  without  assumptions,  often  no  better  estimate  exists.  Definite  recon¬ 
struction  requires  knowledge  of  image  formation  parameters  like  light  field,  sur¬ 
face  reflectance  (BRDF)  and  albedos.  However,  if  assumptions  like  those  com¬ 
monly  made  in  Shape-from-Shading  are  valid,  the  number  of  unknowns  is  greatly 
reduced. 

The  illumination  and  reflectance  assumptions  are  appropriate  for  subterranean 
environments.  Most  dry  underground  mines  and  caves  are  located  in  Lambertian 
rock  and  many  coal  mine  interiors  are  additionally  covered  with  diffuse  material 
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like  Shotcrete  [7].  Low  amounts  of  metallic  meshing,  industrial  equipment,  water 
and  retro-reflectors  are  present,  but  the  contribution  of  these  specular  surfaces  can 
be  reduced  using  the  method  documented  below  and  in  [10].  Robots  in  these  natu¬ 
rally  dark  environments  can  be  fitted  to  carry  small  area  light  sources  for  photog¬ 
raphy  which  produce  simple  light  fields. 

The  MRF  image  observation  ( I )  is  estimated  using  Shape-from-Shading  given 
the  above  assumptions.  A  lightness-based  direct  normal  estimation  method  which 
uses  range  information  is  given  below,  but  other  techniques  exist  [8,9].  This 
method  factors  range  information  to  allow  varying  albedos  and  trades  accuracy  for 
feature  preservation.  The  effect  of  the  light  source’s  irradiance  fall-off  is  first  re¬ 
moved  from  the  raw  image  data  (E0).  We  assume  the  following  irradiance  correc¬ 
tion  model  for  small  area  sources  (4.1): 

EmUase,=r(E0)-Rn  (4.1) 

The  radiometric  function  (/)  maps  pixel  values  to  irradiance,  (R  )  is  the  inter¬ 
polated  depth  estimate  and  (n)  is  the  irradiance  fall-off  factor.  For  ideal  point 
sources  n  =  2.0 ,  while  n  <  2.0  for  near-field  area  sources.  The  experimental 
setup  described  below  exhibits  an  empirical  decay  of  71  =  1.265  .  The  corrected 
image  ( Ec )  is  devoid  of  a  near- field  illumination  intensity  bias. 

Converting  RGB  color  into  a  single  intensity  value  provides  compactness  and 
symmetry,  and  also  minimizes  chromaticity  effects.  Color  space  transformations 
such  as  CieLAB  or  YCbCr  are  often  used  to  heuristically  isolate  the  lightness 
component  of  an  image,  discarding  chromaticity  and  albedo.  Zickler’s  SUV  trans¬ 
formation  [10]  describes  a  class  of  physics-based  specular-invariant  color  spaces 
produced  by  rotating  the  RGB  space  such  that  a  single  channel  is  aligned  with  the 
illuminant  color  vectors.  This  method  has  produced  excellent  results  with  single¬ 
source  images  and  enables  many  Lambertian  algorithms  to  handle  a  large  set  of 
environments  with  specularities.  The  specular  invariant  image,  as  defined  in  eqs. 
4.2-4.3,  is  used  in  experimentation: 

[s,u,vf  =Rr(e){E^,E%M,E%MJ  (4.2) 

Etm=Ju1+v2  (4.3) 

Rr{0)  is  defined  as  a  (3x3)  rotation  matrix  that  aligns  the  red  channel  of  an 
{r,  g,b}  triple  with  the  source  color.  The  magnitude  of  the  {w,v}  components  is 
taken  to  be  the  diffuse  image. 

An  albedo  map  is  subsequently  generated  from  the  diffuse  image  using  Blake’s 
method  for  lightness  computation  [11].  Perceived  intensity  is  a  multiplicative  rela¬ 
tionship  between  surface  slant  angle  and  reflectance.  The  log  image  separates 
these  components  into  additive  terms.  Scene  albedos  can  be  recovered  from  the 
gradient  of  the  log  diffuse  image  by  thresholding  to  remove  small  changes  and  in- 
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tegrating.  It  is  noted  that  the  problem  can  be  recast  as  finding  the  log  albedo  map 
( 8 )  that  minimizes  the  following  [11]: 

afg  min,  \i> 8 - Ta  ( £  log  Einv  )f  +  |i  S -  Ta  (flog  Einv  )|  (4.4) 

where  ( Ta )  is  the  threshold  function.  Exponentiating  ( S )  with  the  proper  con¬ 
stant  of  integration  produces  the  albedo  values  (4.5).  The  constant  can  be  esti¬ 
mated  from  the  range  data  to  minimize  depth  discrepancy  in  the  reconstruction. 

Pes,  =  exP(£  +  C)  (4.5) 

Einv  =  P\n\\l\COS{ffnl)  (4-6) 

Q„i  =  arccos  (fj )  (4.7) 

The  polar  estimates  ( Gnl )  are  combined  with  azimuth  estimates  {</>)  from  the 
range  image  and  converted  to  gradients  for  integration  in  the  MRF. 


5  Experimental  Results 

The  experimental  setup  uses  both  a  continuously  rotating  planar  LIDAR  scanner 
and  an  8  megapixel  DSLR  camera  mounted  to  a  mine  robot.  A  small  area  light 
source  is  also  mounted  along  the  same  axis  to  minimize  cast  shadows  in  the  im¬ 
age.  This  replaces  the  normal  flood  lighting  for  the  imager.  The  scanner  has  a 
practical  throughput  of  -40,000  points  per  second.  The  points  are  aligned  along 
concentric  rings  with  0.5°  angular  separation  in  a  180°  hemisphere  in  front  of  the 
unit.  The  camera  takes  hemispherical  images  using  a  constant  angular  resolution 
fisheye  lens  with  a  182°  field  of  view.  The  sensor  mounting  configuration  and  ex¬ 
ample  data  are  shown  in  Fig.  1  below. 


Fig.  1.  (Left)  Experimental  setup  with  1.  LIDAR  scanner.  2.  Fisheye  Camera,  3.  Light  Source. 
(Center)  Raw  fisheye  imagery.  (Right)  Ground  truth  range  image. 


Thirty  complete  datasets  consisting  of  LIDAR  scans,  High  Dynamic  Range 
(HDR)  imagery  and  robot  odometry  were  collected  from  the  Bruceton  Research 
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Coal  Mine  in  Pittsburgh,  PA.  LIDAR  scans  averaged  600,000  points.  HDR  images 
were  each  generated  from  a  series  of  5  images  corresponding  to  exposures  times 
of  {%,  Yi,  1,  2,  4}  seconds  using  the  method  described  in  [12].  The  1.0  second  ex¬ 
posure  image  was  used  as  the  Low  Dynamic  Range  (LDR)  reference  image  for 
analysis.  An  additional  16  datasets  of  LDR-only  imagery  were  also  collected. 

A  ground  truth  range  map  was  generated  for  each  LIDAR  scan  using  the  full 
point  cloud.  Multiple  measurements  mapping  to  the  same  pixel  were  averaged. 
The  scans  were  subsequently  down-sampled  to  25,000  points  and  interpolated  into 
a  range  image  for  testing  the  method.  The  datasets  were  further  partitioned  into  25 
test  sets  and  5  training  sets.  Optimal  weighting  factors  were  learned  using  a  sim¬ 
plex  search  on  the  training  set,  while  validation  occurred  in  the  test  set. 

The  proposed  method  was  compared  against  Diebel’s  method  and  raw  interpo¬ 
lation.  The  mean  per-pixel  L{  norm  (Manhattan  distance)  between  the  recon¬ 
structed  range  map  and  the  ground  truth  map  was  used  as  a  benchmark  for  com¬ 
parison.  Ground  truth  data  points  outside  the  convex  hull  of  LIDAR  values  in  the 
interpolated  map  were  discarded  due  to  skew  in  scoring  extrapolated  points.  The 
usable  pixel  area  is  determined  for  each  scan  by  the  number  of  saturated  pixels, 
the  range  image  convex  hull  and  removal  of  high-gradient  probable  error  values. 


Algorithm  Performance 


Fig.  2.  Reconstruction  Improvement  vs.  Raw  Interpolation. 


The  results  of  the  experiment  are  summarized  in  Table  1  and  Fig.  2.  Fig.  3  and 
Fig.  4  show  an  example  reconstruction  from  a  single  view  point.  The  scene  fea¬ 
tures  a  yellow  nylon  mine  curtain  on  the  left  side,  wooden  cribbing  stacks  on  the 
right  and  aluminum  meshing  integrated  into  a  mostly  exposed  ceiling. 

Table  1.  Summary  of  Results 


Quantity 

Details 

Total  Test  Datasets 

41 

{HDR,  LDR-only}  Datasets 

{25, 16} 

Interpolation  Improvement 

Mean 

12.2% 

Max,  Min 

19.2%,  3% 

Density  Statistics 
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LIDAR  downsample 

25,000  points 

Ground  Truth  LIDAR 

669,834  points 

Mean  Resultant 

1,045,358  points 

Mean  Increase 

41.8  x 

Image  Usability  Information 

LDR  Saturated 

3.17%  of  total  pixels 

HDR  Saturated 

4.20  x  1  O'2  %  of  pixels 

HDR  Accuracy  Increase 

20.5%  over  LDR-only 

HDR  Density  Increase 

51.5%  over  LDR-only 

Fig.  3.  Point  Cloud  of  Cribbing.  Low  resolution  cloud  (left)  and  high  resolution  reconstruction 
from  algorithm  (right)  showing  stacked  timbers  supporting  the  roof. 


Fig.  4.  Colorized  3D  Reconstruction.  Full  scene  (left)  and  mine  curtain  detail  (inset  and  right). 

Additional  data  of  two  corridors  were  also  collected  at  the  Bruceton  Mine 
along  evenly  spaced  intervals  roughly  3  meters  apart.  Using  robot  odometry  and 
Iterative  Closest  Point  (ICP)  alignment,  multiple  scans  were  up- sampled  using  the 
proposed  technique,  fused  together  and  color/illumination  compensated.  These 
models  represent  some  of  densest,  most  comprehensive  mine  reconstructions  to 
date  using  a  mobile  robot.  The  results  appear  below: 

Table  2.  Corridor  Modeling  Statistics. 


Model  # 

#  of  Scans 

#  of  Images 

#  Points 

1 

4 

16 

5,543,451 

2 

8 

32 

9,680,105 

Fig.  5.  Mine  Corridor  ICP  model.  (1)  External  view.  (2)  Internal  view  with  rail  tracks. 


The  results  are  displayed  using  a  hole-filling  method  similar  to  the  multi- scale 
push-pull  technique  in  [13].  This  display  system  is  adapted  to  benefit  from  high 
density  clouds  generated  using  super-resolution  methods.  Point  clouds  are  ren¬ 
dered  with  push-pull  interpolation  in  image  space.  A  min-depth  check  and  kernel 
density  estimator  are  used  to  resolve  edge  discontinuities  and  remove  occluded 
background  measurements.  The  utilization  of  texture  in-painting  for  both  color  in¬ 
terpolation  and  depth  reconstruction  provides  the  viewer  with  graphical  continuity 
as  well  as  proper  occlusions,  which  standard  point  displays  lack.  In  addition  to  fast 
rendering  of  huge  datasets,  the  Tenderer  allows  the  model  to  be  updated  in  real 
time  as  new  data  arrives  without  costly  re-meshing  operations.  The  system  can 
generate  real-time  (>30Hz)  imagery  at  1080p  HD  resolution  on  commodity  (Ge¬ 
Force  GTX  260)  hardware  with  point  clouds  of  greater  than  5  million  points 


7  Analysis 

The  results  show  that  the  method  increases  interpolation  accuracy  by  up  to  20% 
on  the  Bruceton  Mine  data,  with  an  average  improvement  of  12%.  The  fisheye- 
spinner  setup  features  density  increases  up  to  70  fold,  with  an  average  of  40x  in¬ 
crease  in  density  (Table  1).  Of  note  is  that  real  resolution  is  created  where  LIDAR 
beam  physics  dictate  a  maximum  angular  resolution.  This  is  apparent  in  3D  scan- 


9 


ning  mechanisms  that  actuate  a  planar  sensor,  where  an  increase  in  data  collection 
time  results  in  diminishing  resolution  returns. 

To  validate  that  true  information  is  being  stored  in  the  interpolated  values,  a 
sliding- window  15x15  pixel  Pearson  correlation  was  performed.  As  shown  in  Fig. 
6,  the  shaded  image  provides  significant  information  about  the  ground  truth  that  is 
not  contained  in  interpolation.  The  fused  range  map  correlates  more  than  either 
source  individually,  concurring  with  the  error  estimation  benchmark.  While  Die- 
bel’ s  method  shows  a  numerical  increase  in  accuracy,  it  is  not  statistically  signifi¬ 
cant.  This  is  corroborated  by  almost  equal  amounts  of  strongly  negative  and  posi¬ 
tive  correlation  in  the  raw  image  data. 


Fig.  6.  (Left  to  right)  Roof  supports  covered  in  Shotcrete,  Image  to  ground  truth  correlation, 
Shaded  image  to  ground  truth  correlation,  and  Reconstruction  error  reduction.  Scale  is  brown  to 
white  over  [-1,  1]  for  correlation  and  navy  blue  to  red  over  [-0.025m,  0.05m]  for  error  reduction. 


The  method  encounters  several  drawbacks  that  prevent  the  fused  result  from 
achieving  the  same  accuracy  as  LIDAR  scans  of  equivalent  density.  Resulting 
range  images  are  vulnerable  to  artifacts  typical  of  raw  interpolation,  although  to  a 
lesser  degree.  Most  reconstruction  error  occurs  at  occlusion  edges  where 
neighboring  LIDAR  points  have  large  disparities.  Regularization  terms  tend  to 
over-smooth  these  edges  and  shading  cues  are  ill-behaved  due  to  cast  shadows, 
among  other  reasons  [9,11].  Attempting  to  isolate  these  specific  edges  in  the  im¬ 
age  is  difficult  due  to  image  noise,  lighting  and  material  specific  effects  and  is  not 
addressed  in  this  research  (see  [5,14]). 


8  Conclusion 

A  method  was  presented  that  fuses  actively  illuminated  CCD  imagery  and  LIDAR 
data.  The  method  demonstrates  increases  in  range  accuracy  of  up  to  20%  on  ex¬ 
perimental  data  over  interpolation  and  increases  in  measurement  density  of  up  to 
70x  using  the  experimental  setup.  The  improvements  are  a  result  of  calibrated  im¬ 
aging  using  additional  knowledge  of  the  image  formulation  model  to  reconstruct  a 
3D  observation  of  the  scene.  This  research  demonstrated  the  efficacy  of  multi¬ 
sensor  mapping  systems  as  well  as  calibrated  imaging  for  field  robots. 

Perhaps  the  greatest  argument  for  range/image  super-resolution  is  that  it  is  eas¬ 
ily  bootstrapped  to  existing  systems.  Subterranean  robots  already  require  light 
sources  for  photography  as  well  as  range  sensors  for  mapping  and  many  high- 
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throughput  commercial  scanners  feature  co-located  cameras.  The  general  use  of  il¬ 
lumination  information  for  super-resolution  is  also  applicable  to  other  domains  in 
field  robotics.  Planetary  robots  are  likely  to  encounter  highly  diffuse  environments 
(i.e.  Mars)  or  characterizable  reflectances  on  bodies  lacking  scattering  atmos¬ 
pheres  (i.e.  moon,  asteroids).  Such  development  is  likely  to  increase  the  safety  of 
exploration  and  prospecting  on  the  moon,  where  sensing  is  secondary  to  payload 
and  comes  at  a  premium  cost. 

In  future  work,  agile  and  high  accuracy  applications  will  benefit  from  one  or 
more  actuated  sensors.  With  an  actuated  camera,  the  technique  can  be  used  to 
zoom  in  on  regions  of  interest  and  selectively  up- sample  a  scan.  With  an  actuated 
range  scanner,  such  as  a  focal  plane  array  LIDAR  with  variable  optics,  a  static 
camera  may  be  used  to  reconstruct  rough  low  frequency  data  from  a  preliminary 
scan  and  detect  areas  of  high  frequency,  while  a  second  LIDAR  pass  focuses  spe¬ 
cifically  on  these  areas.  This  setup  can  drastically  compress  the  amount  of  data 
taken  and  reduce  time  required  while  producing  an  optimal  reconstruction  given 
certain  throughput  constraints. 
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A  Communication  Framework  for  Cost-effective 
Operation  of  AUVs  in  Coastal  Regions 


Arvind  Pereira,  Hordur  Heidarsson,  Carl  Oberg,  David  A.  Caron,  Burton  Jones  and 
Gaurav  S.  Sukhatme 


Abstract  Autonomous  Underwater  Vehicles  (AUVs)  are  revolutionizing  oceanog¬ 
raphy.  Most  high-endurance  and  long-range  AUVs  rely  on  satellite  phones  as  their 
primary  communications  interface  during  missions  for  data/command  telemetry  due 
to  its  global  coverage.  Satellite  phone  (< e.g .,  Iridium)  expenses  can  make  up  a  signif¬ 
icant  portion  of  an  AUV’s  operating  budget  during  long  missions.  Slocum  gliders 
are  a  type  of  AUV  that  provide  unprecedented  longevity  in  scientific  missions  for 
data  collection.  Here  we  describe  a  minimally-intrusive  modification  to  the  existing 
hardware  and  an  accompanying  software  system  that  provides  an  alternative  robust 
disruption-tolerant  communications  framework  enabling  cost-effective  glider  oper¬ 
ation  in  coastal  regions.  Our  framework  is  specifically  designed  to  address  multiple- 
AUV  operations  in  a  region  covered  by  multiple  networked  base-stations  equipped 
with  radio  modems.  We  provide  a  system  overview  and  preliminary  evaluation  re¬ 
sults  from  three  field  deployments  using  a  glider.  We  believe  that  this  framework  can 
be  extended  to  reduce  operational  costs  for  other  AUVs  during  coastal  operations. 


1  Introduction  and  Motivation 

Autonomous  Underwater  Vehicles  (AUVs)  are  revolutionizing  oceanography.  They 
have  been  widely  used  for  in-situ  measurements  which  would  be  difficult,  expen¬ 
sive,  and,  in  some  cases,  impossible  to  obtain  by  using  traditional  ship-based  sam- 
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Fig.  1:  (a)  Basestation  locations  (current  and  planned)  surrounding  the  Southern 
California  Bight  (SCB),  (b)  The  Slocum  glider,  (c)  Hardware  modifications  to  the 
glider 


pling  techniques  [1].  AUVs  typically  use  thrusters,  rudders  and  fins  as  actuators  [5]. 
Gliders  [6,  3]  are  specialized  AUVs  that  rely  on  buoyancy  control  and  shifting  cen¬ 
ter  of  mass  for  propulsion,  to  fly  in  the  ocean  -  an  energy-efficient  technique  that 
results  in  long  mission  times  (3-4  weeks)  at  sea. 

Table  1  shows  several  popular  AUV  platforms,  and  their  primary  modes  of  com¬ 
munication.  The  usual  operation  of  AUVs  involves  the  creation  of  a  mission  file 
during  the  mission  planning  stage  (onshore).  Most  of  the  vehicles  in  Table  1  use 
a  radio  link  (WiFi,  radio  modem)  for  operator-vehicle  communications  when  the 
vehicle  is  near  the  operator.  This  typically  occurs  during  the  mission  upload  phase. 
Once  deployed,  AUVs  typically  communicate  with  a  basestation  onshore  (or  on  a 
ship)  using  an  acoustic  modem  or  a  satellite  phone. 

While  satellite  phones  have  the  advantage  of  being  usable  at  almost  any  ocean 
surface  location,  they  are  plagued  by  very  low  data-rates  ( e.g .,  2400  bps  maximum 
for  Iridium)  and  high  costs  for  transmitted  data  or  call  time.  Slow  data  rates  imply 
longer  times  spent  at  the  surface  for  data  transfer.  This  is  a  safety  concern  in  areas 
with  high  marine  traffic  such  as  the  Southern  California  Bight  (SCB),  our  region 
of  interest  and  operation  (the  SCB  is  the  oceanic  region  contained  within  32°N  to 
34.5°N  and  117°E  to  121°E).  Satellite  phone  communications  are  expensive.  We 
estimate  the  nominal  communication  cost  for  Iridium  usage  to  be  approximately 
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USD  2400  for  a  3  week  glider  mission  or  approximately  half  of  the  total  expendible 
cost  of  the  deployment.  Others  [6]  report  their  Iridium  communications  cost  to  be 
approximately  USD  180/day  which  translates  to  approximately  USD  3500  for  a  3 
week  mission  for  a  single  glider.  These  limitations  imply  that  during  a  surfacing, 
experimenters  using  satellite  phones  are  often  forced  to  transmit  subsets  of  data 
from  the  AUV,  instead  of  the  entire  dataset.  The  Iridium  plan  used  in  this  work,  is 
based  on  call  time. 


Table  1:  AUVs 


Name 

Manufacturer 

Endurance 

Radio 

Acoustic 

Satellite 

Bluefin-12 

Bluefin  Robotics 

10-23  h 

Yes 

Yes 

Yes 

HUGIN  1000 

Kongsberg 

17-24  h 

WiFi 

Yes 

Yes 

REMUS  600 

Hydroid 

20-45  h 

WiFi 

Yes 

Yes 

Gavia 

Hafmynd 

24  h 

WiFi 

Yes 

Yes 

SAUV  II 

Falmouth  Scientific 

Unlimited 

Yes 

Yes 

Yes 

Slocum  Electric  Glider 

Webb 

4  weeks 

Yes 

No 

Yes 

Spray  Glider 

UCSD 

months 

No 

No 

Yes 

Seaglider 

iRobot 

months 

No 

No 

Yes 

One  strategy  to  mitigate  the  shortcomings  of  satellite  phones  is  to  use  acoustic 
modems  [9,  4,  2]  or  combined  acoustic/optical  strategies  [8].  The  obvious  advantage 
is  that  AUVs  need  not  surface  to  communicate  if  they  are  using  acoustic  modems. 
However,  data  rates  on  acoustic  systems  are  typically  low,  they  also  have  a  high  one¬ 
time  cost  and  suffer  from  multi-path  interference  in  shallower  coastal  regions.  Op¬ 
tical  techniques  are  typically  shorter  range  and  unsuitable  for  operations  in  deeper 
waters. 

We  remark  that  radio  modems  ( e.g .,  the  Freewave™)  used  on  AUVs  ( e.g .,  the 
Webb  Slocum  glider  (Table  1))  are  rated  for  a  range  of  60  miles  line-of-sight.  Their 
use  need  not  be  restricted  to  dockside  operations  for  mission  upload;  it  could  be 
extended  to  large  near-coastal  regions  (e.g.,  the  SCB).  A  multi- AUV  deployment 
over  an  extended  time  period  in  a  region  as  large  as  the  SCB  could  see  significant 
cost  reductions  if  the  primary  mode  of  communication  with  the  AUV  was  a  radio 
modem  instead  of  a  satellite  phone.  Our  experimental  platform,  the  Webb  Slocum 
Glider,  is  primarily  designed  to  communicate  using  a  Iridium  satellite  modem  dur¬ 
ing  missions.  When  the  operator  is  within  Freewave  range,  the  modem  is  typically 
used  for  launch,  retrieval,  data  transfer,  and  maintenance  of  the  vessel.  In  the  course 
of  a  typical  mission,  the  Freewave  is  used  infrequently,  and  mostly  at  the  dockside. 
This  is  because  its  effective  range  to  the  operator  is  rather  small  since  it  is  rare  to 
obtain  line-of-sight  connectivity  between  vehicle  and  operator  during  a  mission  due 
to  occlusion. 

Can  the  effective  range  of  the  radio  modem  be  extended  so  that  a  region  the  size  of 
the  SCB  would  be  effectively  ’ covered ’  thus  rarely  necessitating  the  use  of  a  satellite 
phone  for  operator-AUV  operations  ?  Here  we  report  on  the  encouraging  progress 
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towards  answering  this  question  in  the  affirmative  by  1 .  designing  and  augmenting 
coastal  communication  infrastructure  (radio  modems  onshore  at  elevated  sites  for 
better  line  of  sight  connectivity  to  the  vehicles),  and  2.  designing,  implementing 
and  testing  protocols  for  data- transfer  using  radio  modems. 

To  exploit  the  radio  modem  to  its  fullest  potential  we  make  the  observation  that 
the  Southern  California  region  has  several  HF-Radar  (CODAR)  sites  at  elevated 
locations.  These  provide  accurate  ocean  surface  current  data,  and  are  always  instru¬ 
mented  with  an  internet  connection.  This  infrastructure  is  a  cost  effective  way  to  set 
up  a  network  of  radio  modem  shore  stations  to  provide  radio  modem  connectivity  to 
vehicles  on  near-coastal  missions.  Elevated  locations  provide  greater  line-of-sight 
with  the  vehicle. 

We  contend  that  with  a  minimal  modification  of  the  vehicles  and  a  small  addition 
to  existing  shore  locations  it  is  possible  to  build  a  network  of  this  kind  that  scales 
with  multiple  vehicles  at  limited  cost.  This  paper  describes  the  design  and  imple¬ 
mentation  of  such  a  system.  We  report  on  communication  tests  using  Webb  Slocum 
gliders  in  the  SCB,  with  the  expectation  that  this  strategy  can  pave  the  way  for  a 
similar  use  of  a  reduction  in  the  communication  costs  for  coastal  operation  of  other 
AUVs. 


2  The  Webb  Slocum  Glider  Communications  System 

The  glider  is  a  specialized  robot  driven  by  buoyancy  which  can  fly  in  the  ocean 
for  extended  periods  of  time  at  the  expense  of  speed  and  maneuverability.  Glider 
designers  have  devoted  significant  effort  to  power  consumption  minimization.  The 
glider’s  navigation  and  communications  are  handled  by  a  low-power  micro-controller 
called  the  Persistor™.  This  computer  performs  standard  navigational  tasks  and 
runs  a  modified  version  of  PicoDOS™called  GliderDOS™which  contains  glider- 
specific  software. 

In  normal  glider  operations,  the  glider’s  Freewave  modem  is  configured  as  a  slave 
to  connect  to  only  a  single  master  Freewave  modem  at  the  operator’s  end.  The  other 
side  runs  software  from  Webb  Research  called  the  DockServer™.  The  glider  can 
also  be  operated  via  any  terminal  client  since  it  provides  a  human-readable  interface 
via  ASCII  strings.  There  is  no  inherent  packetization  of  data  being  performed  on 
the  glider  since  it  assumes  that  it  is  always  connected  to  a  single  computer  via  either 
of  its  two  links  (Freewave  or  Iridium).  This  situation,  coupled  with  the  fact  that 
the  Freewave  modems  do  not  have  a  mode  of  operation  which  can  independently 
handle  hand-offs  between  modems,  means  that  it  is  difficult  to  build  a  reliable  end- 
to-end  system  to  communicate  with  a  glider  without  using  packetization  for  the 
identification  of  sources  and  destinations.  Any  disruption  in  communication  due  to 
loss  of  a  link  or  a  reconnection  of  the  glider  via  a  new  link,  results  in  data  corruption. 
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3  System  Design 

Our  system  consists  of  three  main  components.  At  the  glider  level,  we  have  the 
communications  module  that  handles  the  radio  communications  on  the  glider.  At 
shore,  we  have  the  internet  connected  basestations  which  have  the  radio  and  antenna 
to  communicate  with  gliders  deployed  in  the  ocean.  Finally,  we  have  the  control 
server,  a  central  data-aggregation  and  command/control  server.  This  overall  system 
design,  illustrated  in  Fig. 2  adds  more  high-level  control  to  the  glider  operations  than 
currently  possible  which  in  turn  facilitates  autonomous  re-tasking  of  the  gliders  on 
mission. 


Fig.  2:  The  System  Block  Diagram 


3.1  Communication  Module  and  Protocol 

The  communications  module  is  a  combination  of  hardware  and  software  to  handle 
communications  between  the  glider  and  shore.  The  hardware  is  specific  to  the  robot 
platform  (in  this  case  a  glider),  and  the  software  has  some  general  building  blocks 
as  well  as  a  platform- specific  interface  devoted  to  interaction  between  the  commu¬ 
nications  code  and  the  control  software  on  the  robot  itself.  We  have  implemented 
this  module  on  a  Webb  Slocum  glider,  and  this  paper  describes  experiments  with 
that  particular  AUV,  but  the  module  can  easily  be  added  to  most  other  AUVs. 

The  basestations  and  gliders  communicate  through  our  own  light-weight  com¬ 
munication  protocol.  Each  basestation  can  store  and  forward  certain  information 
between  specified  nodes  (gliders  in  this  case).  We  treat  the  Freewave  modems  as 
a  serial  link,  and  have  incoming  and  outgoing  packet  queues  which  provide  feed¬ 
back  to  vary  both  the  inter-packet  delay  as  well  as  Freewave  packet- sizes.  Freewave 
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modems  make  a  ’’best-effort”  delivery  attempt  on  these  packets,  which  can  be  frag¬ 
mented  into  smaller  pieces  in  the  event  of  poor  connections.  This  protocol  (which 
we  will  describe  in  more  detail  in  a  future  paper),  also  supports  guaranteed  delivery 
as  well  as  a  non-guaranteed  mode  of  transmission.  The  packet  structure  contains  14 
bytes  without  payload  data,  and  allows  several  applications  to  multiplex  data,  such 
as  file  transfers,  status  packets,  data  packets,  terminal  commands  and  so  on.  We  use 
a  Selective-Acknowledgement  communications  scheme  with  a  fixed  window  which 
works  well  in  regions  with  persistent  line-of-sight  between  antennae. 

On  our  Slocum  gliders,  the  communication  module  hardware  consists  of  a  Gum- 
stix  computer  and  the  aforementioned  software.  Adding  the  Gumstix  between  the 
glider’s  control  computer  and  the  Freewave  is  a  minimally  intrusive  way  of  adding 
new  communication  capabilities  to  the  glider.  Using  a  separate  computer  to  inter¬ 
face  with  a  vehicle  abstracts  the  interface  between  higher-level  communications  and 
lower  level  vehicle  control  cleanly  and  can  also  be  used  in  the  future  to  handle  glider 
re-tasking. We  chose  a  Gumstix  because,  besides  its  small  size,  it  is  a  fully  functional 
Linux-based  computer,  consumes  very  little  power  (<  120mA  @5V,  400MHz),  and 
has  good  interfacing  capabilities.  Fig.  1(c)  shows  the  physical  modifications  to  the 
glider  due  to  the  addition  of  the  Gumstix.  We  need  to  make  only  5  modifications  to 
the  glider  to  allow  the  glider  to  communicate  with  an  external  computer.  Although 
the  Gumstix  consumes  more  power  than  the  persistor  alone,  we  have  designed  the 
system  such  that  the  Gumstix  is  powered  up  along  with  the  Freewave  modem  -  a 
feature  that  ensures  that  it  gets  automatically  turned  on  at  the  water  surface,  while 
staying  shut  off  when  the  glider  is  diving. 

The  platform- specific  software  on  the  glider  intercepts  all  messages  to  and  from 
the  glider  persistor  and  the  Freewave  modem,  parses  the  ASCII  strings  from  the 
glider  and  follows  the  basic  control  flow  displayed  in  Fig.  3.  It  also  sends  necessary 
status  information  to  shore  using  our  packet  protocol,  gathers  sensor  data  files  from 
the  persistor,  compresses  them  and  sends  them  to  the  shore  station.  If  communica¬ 
tion  with  the  main  server  via  Freewave  is  unavailable,  the  glider  falls  back  to  Iridium 
to  call  in. 

3.2  Base  Stations 

The  basestations  are  the  shore  stations  that  handle  direct  communication  with  the 
gliders.  The  hardware  consists  of  an  internet-connected  computer,  running  Ubuntu 
Linux  Server,  a  Freewave  radio  and  an  antenna.  The  computer  runs  our  commu¬ 
nication  software  which  essentially  relays  datagrams  between  the  gliders  and  the 
control  server.  Communications  between  these  basestations  and  the  control  server 
take  place  via  TCP/IP.  If  the  basestation  loses  connection  to  the  control  server  it 
turns  of  its  Freewave  to  ensure  gliders  connect  to  another  basestation  or  fall  back  to 
Iridium  for  communication. 

3.3  Control  Server 

The  control  server  is  the  orchestrator  of  the  overall  system.  It  is  written  in  C++  and 
runs  on  Ubuntu  Linux  Server  and  utilizes  a  MySQL  database  for  storing  data.  It 
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Fig.  3:  Control  Flow  of  tasks  performed  during  glider  surfacings 


maintains  a  connection  to  each  of  the  basestations,  keeps  track  of  the  state  of  the 
system  and  is  in  charge  of  issuing  commands  to  connected  gliders.  It  also  notifies 
the  end  users  of  events  via  email.  All  data  it  receives  is  logged  to  a  database.  On 
top  of  the  control  server,  we  have  a  web  based  user  interface,  written  in  Java™with 
Google  Web  Toolkit,  to  provide  easy  accessible  control  and  visualization  to  the  end 
user  of  the  system.  The  server  also  runs  3rd  party  monitoring  software  to  monitor 
the  health  of  the  basestation  computers  and  software.  If  a  problem  is  encountered 
system  administrators  are  notified. 

4  Experiments  and  Results 

To  test  our  design,  we  have  performed  3  experimental  deployments.  The  first  was 
off  the  coast  of  Santa  Catalina  Island  in  November  2008.  The  other  two  were  near 
Pt.  Fermin  in  January  and  February  2009.  The  heat-map  for  File  transfers  (Fig.  4) 
is  based  on  results  extrapolated  from  glider  surfacing  and  transferring  files  to  shore 
from  locations  C,  D  and  E.  The  heat-map  for  Carrier  Detect  (Fig. 5(a))  shows  a 
percentage  of  time  the  base-stations  had  Carrier-detect  with  the  gliders  to  the  to¬ 
tal  time  of  a  surfacing.  This  metric  is  based  on  4  surfacings.  Fig.  5(b)  shows  a 
heat-map  based  on  the  percentage  of  protocol-level  Link  States  to  the  total  Carrier- 
detect  duration  during  a  given  surfacing.  This  metric  gives  us  an  idea  of  how  well 
our  protocol  is  performing  in  real  conditions.  Observations  show  that  during  inter¬ 
mittent  communicatios,  such  as  those  at  C  and  D,  the  protocol-link  suffers.  These 
heat-maps  are  based  on  simple  linear-interpolation  of  very  sparse  data  with  no  un¬ 
derlying  communication  model  assumption.  Although  we  have  not  analyzed  the 
relationship  between  sea- state  and  the  communication  quality  achieved,  the  average 
wave-height  was  approximately  lm  with  a  dominant  period  of  6  seconds  during  the 
data-collection  period. 
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Table  2:  Results  from  Field  and  Laboratory  Tests 


Test 

Maximum 
Distance  [km] 

Data  Rate 
(KB/sec) 

File 

Transfers 

Freewave 

Switching 

Quality 
of  Link 

A  -  Pt.Fermin  * 

2.6 

Not  Measured 

Not  Tested 

No 

Poor 

B  -  Pt.Fermin  * 

5.2 

Not  Measured 

Not  Tested 

No 

Poor 

C  -  Pt.Fermin  t 

12.3 

Not  Measured 

Not  Tested 

Yes 

Intermittent 

D  -  Pt.Fermin  + 

9.2 

0.1535 

Yes 

No 

Slow 

E  -  Pt.Fermin 

3.5 

1.46 

Yes 

No 

V.Good 

L  -  Lab  Tests 

N.A. 

7.883 

Yes 

No 

V.Good 

*  This  test  was  conducted  with  a  faulty  antenna  installation  at  Pt.Fermin 
f  Connection  was  made  to  both  Pt.  Fermin  and  Catalina  Island.  Distance  to  Catalina  was  20  km. 
+  Remote  operator  performed  a  glider  re-tasking  via  network  at  this  location  through  Pt.Fermin 


Fig.  4:  File  Transfer  data  rate  off  Pt.  Fermin 


Fig.  5:  a)  Carrier-detect  success  percentage  (CD  on  time/Total  time)  b)  Link-Layer 
protocol’s  uptime  percentage  (Protocol  time/CD  on  time) 
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5  Conclusion 

This  paper  outlined  the  design  behind  a  communication  system  using  long-range 
RF-modems,  to  communicate  with  AUVs  in  a  coastal  area.  Field  trials  using  Slocum 
gliders  indicate  promising  results  and  give  us  valuable  insights  into  improvements  in 
the  design.  On  the  gliders  themselves,  we  add  a  small  computer  to  interface  between 
the  original  glider  computer  and  its  Freewave  modem.  Our  communications  proto¬ 
col  supports  datagrams  with  priority,  can  maintain  in-sequence  transmissions, and 
has  both  reliable  and  un- acknowledged  datagram  modes.  Instead  of  transmitting  all 
the  data  from  the  glider,  we  create  status  packets  which  contain  a  snapshot  of  the 
gliders  state  by  parsing  its  lengthy  ASCII  transmission.  We  also  utilize  the  Gum- 
stix’s  processing  power  to  compress  data  files  before  transmission,  which  provides 
us  with  a  typical  space  saving  of  approximately  4x.  The  high  data  rates  our  system 
achieves  in  the  field  (1.46KB /sec)  is  6  times  faster  than  Iridium  while  simultane¬ 
ously  transmitting  multiplexed  glider  console  information  and  status  packets.  This 
combined  speed  increase  translates  to  a  24x  improvement,  which  allows  us  to  send 
more  data,  while  also  reducing  surface  times  and  cutting  down  on  Iridium  data- 
transfer  costs. 

Fig.  4,  a  sparsely  interpolated  map  based  on  only  3  averaged  measurements,  im¬ 
plies  that  fairly  high  throughputs  are  possible  close  to  a  base-station  (<4km),  with 
a  fairly  sharp  bit-rate  drop  from  1.46KB/s  to  approximately  154bytes/sec.  This  sig¬ 
nificant  drop  is  due  to  the  links  becoming  more  intermittent  as  carrier  detect  on  the 
radio  is  only  available  65%  of  the  time  at  a  distance  of  9.2km,  while  it  is  more  than 
86.7%  at  3.5km.  The  measurements  of  communication  performance  we  present  here 
are  sparse  and  were  collected  at  four  surfacing  locations,  but  they  represent  charac¬ 
teristic  portions  of  a  typical  coastal  belt  that  needs  coverage.  Field  tests  have  shown 
that  our  system  allows  status  packets  to  be  reliably  transmitted  from  distances  upto 
20km  (Glider  Surfacing  C).  We  have  successfully  performed  glider  re-tasking  via 
the  network  from  a  distance  of  9.2km  -  a  feature  we  will  use  in  the  future  to  enable 
mission  re-planning  based  on  data  gathered  by  the  glider.  We  have  also  developed  a 
central  server  which  allows  us  to  easily  collect  and  visualize  data  from  the  glider,  or 
create  and  send  new  mission  files  through  the  network. 

6  Future  Work 

Experimental  results  while  promising,  show  that  there  is  room  for  improvement. 
From  Fig. 5  (a)  and  (b)  we  make  the  observation  that  we  can  improve  upon  our 
protocol,  such  that  its  link  spans  all  the  time  the  radio  has  carrier.  We  understand 
that  this  is  a  consequence  of  protocol  choices,  which  were  tuned  to  obtain  good 
results  at  the  lab  -  which  as  observed,  is  significantly  different  from  conditions  in 
the  field.  We  believe  local  conditions  due  to  waves  play  a  major  role  in  causing 
communications  disruptions,  since  the  antenna  of  the  Slocum  glider  is  very  close  to 
the  waters  surface  and  local  waves  occlude  line-of- sight  between  radios.  We  believe 
that  by  using  better  queue  management,  introducing  variations  of  re-transmit  time 
and  packet  sizes,  based  on  the  link  state,  can  lead  to  significant  improvements  in 
ensuring  we  have  a  much  better  protocol-level  link  between  the  glider  and  base- 
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stations.  We  are  also  in  the  process  of  mapping  out  the  entire  region  of  interest 
for  link  quality.  Equipped  with  such  a  map,  we  can  then  design  planners  which 
incorporate  the  knowledge  of  communication  link  availability  to  bias  the  surfacings 
of  AUVs  such  that  they  keep  overall  operation  costs  low.  Concurrent  work  [7]  in 
our  lab,  used  Iridium  to  perform  feature  tracking  based  on  ocean  model  predictions 
for  the  Southern  California  Bight  region.  By  using  our  communication  system  to 
perform  mission  adaptations,  we  will  get  a  more  realistic  comparison  between  cost- 
savings  using  it  instead  of  Iridium. 
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Multi-Robot  Collaboration  with 
Range-Limited  Communication: 
Experiments  with  Two  Underactuated 
ASVs 
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Abstract  —  We  present  a  collaborative  team  of  two  under-actuated  au¬ 
tonomous  surface  vessels  (ASVs)  that  performs  a  cooperative  navigation  task 
while  satisfying  a  communication  constraint.  Our  approach  is  based  on  the 
use  of  a  hierarchical  control  structure  where  a  supervisory  module  commands 
each  vessel  to  perform  prioritized  elementary  tasks,  a  behavior-based  con¬ 
troller  generates  motion  directives  to  achieve  the  assigned  tasks,  and  a  ma¬ 
neuvering  controller  generates  the  actuator  commands  to  follow  the  motion 
directives.  The  control  technique  has  been  tested  in  a  mission  where  a  set  of 
target  locations  spread  across  a  planar  environment  has  to  be  visited  once 
by  either  of  the  two  ASVs  while  maintaining  a  relative  separation  less  than  a 
given  maximum  distance  (to  guarantee  inter- AS V  wireless  communication). 
Experiments  were  carried  out  in  the  field  with  a  team  of  two  ASVs  visiting 
22  locations  on  a  lake  surface  (approximately  30000m2)  with  static  obstacles. 
Results  show  a  30%  improvement  in  mission  time  over  the  single-robot  case. 

1  Introduction 

A  significant  body  of  literature  deals  with  the  motion  control  of  aquatic 
vehicles  for  autonomous  navigation  [11].  Interest  in  the  field  is  motivated 
by  different  applications,  e.g.,  naval  system  applications,  harbor  operations, 
defense  and  patrolling  of  coastal  perimeters,  and  marine  biology  applications. 
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Motion  control  of  Autonomous  Surface  Vessels  (AS Vs)  has  been  studied  in 
the  context  of  dynamic  positioning  for  fully  actuated  [15]  and  under-actuated 
vessels  [18],  as  well  as  trajectory  tracking  [14,  1].  Separately,  collaboration  in 
multi-robot  teams  [10]  has  been  widely  studied.  In  the  intersection  of  these 
two  areas  -  motion  control  of  a  fleet  of  AS  Vs  -  different  approaches  have 
been  proposed,  however  most  of  them  have  only  been  validated  by  numerical 
simulations  [8] ,  or  by  experiments  with  a  single  real  vessel  and  simulating  the 
others  [13].  Here,  we  focus  on  the  cooperative  control  for  ASVs  and  present 
the  results  of  experiments  with  a  team  of  two  underactuated  ASVs  in  the 
field  performing  a  joint  mission  motivated  by  environmental  monitoring. 

We  provide  a  list  of  target  locations  to  the  vessels.  They  must  dynamically 
allocate  locations  among  themselves  such  that  each  location  is  visited  exactly 
once,  and  all  obstacles  are  avoided.  The  approach  proposed  here  is  applicable 
to  a  scenario  with  dynamic  obstacles  and  targets  since  it  does  not  pre-plan  a 
route.  In  the  course  of  the  mission  the  ASVs  are  required  to  maintain  connec¬ 
tivity  at  all  times.  Unlike  [16]  where  connectivity  is  maintained  by  measuring 
signal  strength  and  the  mobility  controller  is  a  spring-damper  system,  we 
use  a  layered  hierarchical  control  decomposition  which  accommodates  a  dy¬ 
namic  environment  (targets  and  obstacles  may  be  added  dynamically) .  When 
needed,  a  leader- follower  configuration  allows  the  team  to  allocate  a  target 
to  the  vessel  closest  to  the  next  target,  while  the  other  vessel  ensures  the 
communication  constraint  is  maintained.  Our  results  from  five  field  trials  at 
a  lake  with  a  two-ASV  team  (communication  constraint  of  60m,  in  an  explo¬ 
ration  area  of  300m  x  100m)  show  a  30%  improvement  in  exploration  time 
compared  to  a  single  vessel. 

We  focus  on  navigation  techniques  for  a  realistic  environment  where  sev¬ 
eral  obstacles  can  be  found.  Thus,  we  make  use  of  a  behavior-based  technique 
as  a  guidance  control  to  take  advantage  of  its  reactivity  to  unpredicted  con¬ 
ditions  [9,  6].  In  particular,  we  present  the  use  of  a  behavior-based  technique 
called  the  Null- Space  based  Behavioral  (NSB)  control  as  a  guidance  system 
for  ASVs.  The  NSB  approach  has  been  extensively  tested  for  the  control  of 
autonomous  ground  robots  and  results  have  shown  robust  control  for  forma¬ 
tion  and  spread  control  for  a  team  of  robots  [4] ,  escorting  an  external  agent 
with  a  team  of  robots  [2],  and  the  formation  control  of  a  fleet  of  ASVs  [7]. 

2  Problem  Description 

We  require  a  set  of  pre-specified  locations  to  be  visited  by  one  (and  only  one) 
of  the  ASVs  exactly  once.  During  the  execution  of  the  mission,  both  ASVs 
must  ensure  that  their  relative  distance  does  not  exceed  a  preset  bound. 
Needless  to  say,  the  vessels  must  avoid  mutual  collisions  and  collisions  with 
obstacles  in  the  environment.  We  decompose  the  overall  mission  into  ele¬ 
mentary  tasks  that  can  be  then  prioritized  and  implemented  individually;  in 
particular:  1.  Avoid  obstacles  and  inter-robot  collisions,  2.  Satisfy  the  com¬ 
munication  constraint,  and  3.  Navigate  to  assigned  target  locations. 
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While  we  discuss  the  details  in  the  next  section,  broadly  speaking  each 
ASV  properly  activates  and  combines  these  three  modes.  When  the  distance 
between  the  vessels  is  consistently  lower  than  the  communication  bound,  they 
are  free  to  choose  targets  and  navigate  to  them  independently.  When  the  dis¬ 
tance  is  close  to  the  communication  range,  a  supervisory  module  on  each 
ASV  has  to  ensure  the  communication  constraint  is  satisfied.  This  is  done 
using  a  leader-follower  policy  where  one  of  the  vessels  (the  leader)  continues 
with  its  mission,  while  the  other  (the  follower)  has  to  enforce  the  separation 
constraint  adapting  its  motion  to  the  leader.  The  navigation  task  moves  each 
(single)  vessel  toward  its  assigned  location.  Since  the  vessels  must  avoid  all 
collisions,  the  obstacle  avoidance  controller  takes  evasion  action  when  obsta¬ 
cles  or  other  vessels  are  within  a  safety  margin. 

The  targets  to  be  visited  are  dynamically  chosen  by  the  vessels  on  the  basis 
of  their  locations.  A  communication  sub- system  ensures  that  a  shared  target 
and  obstacle  map  are  maintained  both  of  which  can  be  updated  dynamically 
from  shore,  allowing  dynamic  missions. 

3  Control  Strategy 

In  order  to  achieve  the  proposed  mission  in  a  cooperative  way,  the  control 
architecture  has  been  organized  into  a  three- level  hierarchy,  as  shown  in  Fig¬ 
ure  1:  Supervisor,  NSB,  and  Maneuvering  control. 

At  the  highest  level,  a  supervisor  is  in  charge  of  selecting  the  active  tasks 
for  the  vessel  and  their  reference  values,  i.e.,  it  activates  the  obstacle  avoid¬ 
ance  behavior  when  the  vessel  is  close  to  the  other  vessel  or  to  a  static  obsta¬ 
cle,  it  defines  the  next  target  to  be  visited  and  it  activates  a  leader-follower 
policy  to  avoid  breaking  the  communication  link.  In  order  to  make  its  deci¬ 
sions,  the  supervisor  makes  use  of  some  information  about  the  vessel  position 
(read  from  the  vessel’s  GPS),  the  map  of  the  environment,  the  set  of  visited 
and  un-visited  locations,  and  information  received  from  the  other  vessel’s 
supervisor.  An  intermediate  level  implements  a  behavior-based  technique, 
namely  the  Null-Space  based  Behavioral  (NSB)  control,  to  simultaneously 
achieve  multiple  tasks  with  different  priority;  the  NSB,  on  the  basis  of  the 
active  tasks  and  their  relative  priority,  defines  motion  directive  for  the  ves¬ 
sel  (e.g.,  the  desired  velocity  and  motion  direction).  Finally,  the  lowest-level 
controller  is  a  maneuvering  control  that,  taking  into  consideration  the  under¬ 
actuated  actuation  system  of  the  vessel,  defines  the  reference  commands  for 
the  actuators  in  order  to  follow  the  motion  directives  received  by  the  NSB. 

3.1  The  Supervisor 

The  supervisor  module  on  each  vessel  has  the  knowledge  of  the  overall  mis¬ 
sion.  In  particular,  it  knows  the  target  map  and,  in  order  to  avoid  revisiting 
the  same  targets  during  the  course  of  the  mission,  it  keeps  track  of  targets 
visited  by  the  vessel  itself  or  by  the  other  one.  When  a  location  is  reached  and 
a  new  target  has  to  be  chosen,  the  supervisor  finds  the  target  nearest  to  the 
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Fig.  1  Sketch  of  control  architecture  for  a  the  team  of  two  under-actuated  vessels. 


current  vessel  position  among  the  non- visited  targets  different  from  the  one 
currently  assigned  to  the  other  vessel.  Once  all  the  targets  have  been  visited, 
the  vessels  are  asked  to  reach  and  keep  a  final  restoring  configuration. 

During  the  vessels’  motion,  the  two  supervisory  modules  are  in  charge  of 
keeping  the  communication  constraint.  To  fulfil  this  constraint,  when  needed, 
they  activate  a  leader-follower  policy  to  cause  one  vessel  to  take  care  of  the 
communication  constraint  while  the  other  continues  its  mission.  The  choice 
of  which  vessel  is  the  leader  and  which  the  follower  is  negotiated  between  the 
supervisors  on  the  basis  of  the  distances  between  the  vessels  and  their  next 
targets.  In  particular,  the  vessel  that  is  the  closest  to  its  next  target  becomes 
the  leader  while  the  other  becomes  the  follower.  The  leader  continues  its 
mission  ignoring  the  other  vessel,  while  the  follower  has  to  control  its  distance 
from  the  leader  while  also  trying  to  stay  close  to  its  target.  The  follower  is 
also  allowed  to  switch  to  a  new  target  if  this  is  closer  than  that  previously 
assigned  to  it.  If  the  communication  link  breaks  or  the  distance  between  the 
vessels  exceeds  a  maximum  threshold,  the  leader  stops  to  wait  for  the  follower 
moving  toward  it.  Moreover,  the  supervisory  module  decides  when  to  activate 
the  obstacle/collision  avoidance.  Finally,  the  supervisory  module  defines  the 
priority  order  of  the  active  tasks. 

3.2  The  Null- Space  based  Behavioral  Control 

The  Null- Space  based  Behavioral  (NSB)  control  is  a  behavior-based  approach 
aimed  at  controlling  the  motion  of  autonomous  vehicles  in  dynamic  scenarios. 
In  particular,  the  NSB,  whose  details  can  be  found  in  [3,  2],  uses  a  hierarchy- 
based  structure  to  simultaneously  achieve  multiple  tasks  using  a  projection 
technique  to  delete  the  components  of  the  lower  priority  tasks  that  would 
conflict  with  the  highest  ones.  Here,  the  NSB  is  used  as  a  guidance  system 
for  an  ASV  that,  on  the  base  of  the  active  tasks  and  of  their  priority  order, 
has  to  define  the  motion  directives  for  the  vessel. 

Following  the  line  of  behavior-based  approaches,  the  mission  of  the  vessel 
is  decomposed  into  elementary  tasks.  For  each  task  a  suitable  task  function 
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is  defined  as  a  =  /(p),  where  a  GlRm  is  the  task  variable  to  be  controlled, 
m  is  the  task  function  dimension,  and  pGlRn  is  the  vessel  position. 

For  each  task,  the  velocity  reference  for  the  vessel  is  specified,  starting 
from  desired  values  cr^(t)  of  the  task  function,  solving  the  inverse  kinematic 
problem  at  a  differential  level.  Thus,  the  velocity  reference  of  the  generic  ith- 
task  is  calculated  as  Vi  =  j\  Aid’i'j  ,  where  j\  is  the  pseudo-inverse  of 

the  task  function  Jacobian,  is  a  constant  positive-definite  matrix  of  gains 
and  is  the  task  error  defined  as  <x-  =  o‘ijd  —  cr*. 

When  the  mission  is  composed  of  multiple  tasks,  the  overall  vessel  velocity 
is  obtained  by  properly  merging  the  outputs  of  the  individual  tasks.  A  velocity 
vector  for  each  task  is  computed  as  if  it  were  acting  alone;  then,  before  adding 
the  single  contribution  to  the  overall  vehicle  velocity,  a  lower-priority  task 
is  projected  onto  the  null  space  of  the  immediately  higher-priority  task  so 
as  to  remove  those  velocity  components  that  would  conflict  with  it.  If  the 
subscript  i  also  denotes  the  priority  of  the  task  with,  e.g.,  Task  1  being  the 
highest-priority  one,  the  overall  vessel  velocity  is  given  by: 


VNSB  =  Vi  + 


(j-jIa) 


V2  + 


(/  -  aj.) 


V3 


(1) 


where  yl  —  j\jij  represents  the  null-space  projector  of  the  ith- task,  i.e.,  it 

filters  the  velocity  components  that  would  conflict  with  the  ith- task. 

To  achieve  the  mission  described  in  Sec.  2,  three  tasks  have  to  be  defined: 

a)  Obstacle- avoidance:  This  behavior,  when  active,  is  always  the  highest 
priority  task  because  its  goal  is  to  preserve  the  integrity  of  the  vessel.  In  the 
presence  of  an  obstacle/vessel  in  the  advancing  direction,  its  aim  is  to  keep  the 
vessel  at  a  safe  distance  from  it.  Thus,  its  implementation  produces  as  output 
a  velocity,  in  the  vessel-obstacle  direction,  that  keeps  the  vessel  at  a  safe 
distance  from  the  obstacle.  Formally,  the  task  function  is  aQ  =  \\p  —  pQ\\  G  IR 
where  pQ  is  the  obstacle  position,  and  J0  =  fT  G  IRlx2  is  the  task  Jacobian 
where  r  =  ||p_p°||  is  the  unit  vector  aligned  with  the  obstacle-to-vehicle 
direction.  Defining  as  gq^  =  d  desired  distance,  the  task  output  is 


v0  =  ||p -p0||). 


(2) 


Possible  motions  in  this  task  null-space  are  all  the  motions  that  do  not  change 
the  distance  from  the  obstacle.  Thus,  the  null-space  projector  projects  the 
velocity  commands  of  the  lower-priority  tasks  along  the  tangential  direction 
of  a  circle  centered  in  the  obstacle  and  passing  through  the  vessel  itself. 

b )Move-to-target:  Defining  the  task  function  as  crt  =  P  G  IR2,  whose 
Jacobian  is  Jt  =  I  G  IR2x2  and  assigning  the  desired  value  as  crt:d  =  Pt-> 
then,  the  output  of  the  task  is  a  velocity,  in  the  target  direction,  proportional 
to  the  distance  from  the  target  pt : 


vt  =  At  (pt  -  p) 


(3) 
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c )Keep  the  communication  constraint :  To  fulfill  the  communication  con¬ 
straint,  a  leader  follower  approach  can  be  applied.  The  follow-the- leader  task 
is  aimed  at  keeping  the  follower  (whose  position  is  Pf)  at  a  distance  d  from 
the  leader  position  pt.  The  task  function  mathematical  definition  is  analo¬ 
gous  to  the  obstacle  avoidance  task,  while  its  output  is  a  velocity,  in  the 
leader-to-follower  direction,  proportional  to  the  difference  among  desired  and 
measured  distance;  moreover,  the  desired  velocity  of  the  leader  is  added  as  a 
feedforward  term: 


vf  =  Af(\\Pi-Pf\\~d)  ||p|_pj  +«<•  (4) 


3.3  Maneuvering  Control 

The  maneuvering  controller  is  an  onboard  controller  aimed  at  steering  the 
vessel  along  a  desired  path  and  moving  it  with  a  desired  velocity  [11,  12]. 
Receiving  motion  reference  commands  from  the  NSB,  the  maneuvering  con¬ 
troller  has  to  generate  the  generalized  forces  applied  by  the  actuators. 


Fig.  2  a)  The  ASV  motion  reference  model;  b)  Two  vessels  during  the  experiment. 


Basing  on  the  model  for  ASV  in  [11]  and  considering  the  underactuated 
propulsion  system  of  the  vessel  (see  Figure  2. a),  a  maneuvering  controller  is 
designed  following  the  approach  proposed  in  [17].  From  this  approach,  the 
maneuvering  control  can  be  expressed  as  the  sum  of  a  heading  autopilot  and 
a  surge  control  aimed  at  causing  the  vessel  to  follow  the  velocity  reference 
commands.  The  heading  autopilot  is  aimed  at  controlling  the  heading  of  the 
vessel  to  make  it  move  in  the  desired  direction  Xnsb •  In  particular,  it  regu¬ 
lates  the  propulsion  torque  and  the  rudder  angle  to  correct  the  orientation  of 
the  vessel.  The  surge  control  has  to  make  the  norm  velocity  of  the  vessel  to 
track  the  value  generated  by  the  NSB;  however,  the  vessel  has  to  move  at  full 
speed  only  when  the  orientation  error  is  null.  Thus,  the  surge  control  works 
as  a  PI  controller  regulating  the  advancing  direction  multiplied  by  a  scaling 
factor  depending  on  the  orientation  error.  Following  the  control  architecture 
of  Figure  1,  the  output  of  the  NSB,  that  is  a  velocity  vector  generated  for 
a  material  point,  can  be  geometrically  represented  through  its  norm  Unsb 
and  its  direction  Xnsb  that  are  given  to  the  maneuvering  control  as  desired 
surge  and  heading/advancing  direction. 
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4  Experimental  Results 

In  this  section  the  experimental  results  of  the  mission  execution  with  the  pro¬ 
posed  control  architecture  are  illustrated.  The  platform  used  for  this  exper¬ 
iment  consists  of  two  Autonomous  Surface  Vehicles  (AS Vs)  designed  by  the 
University  of  Southern  California’s  Robotic  Embedded  Systems  Lab.  Each 
ASV  is  an  OceanScience  QBoat-I  hull  with  a  length  of  2.13  m  and  a  width 
of  0.71  m  at  the  widest  section.  Each  ASV  is  equipped  with  an  onboard  com¬ 
puter,  a  wireless  bridge,  and  a  navigation  package  consisting  of  a  GPS  unit, 
a  three- axis  accelerometer,  a  compass,  and  a  rate- gyro;  the  vessels  weighs  ap¬ 
proximately  50  kg  with  instrumentation  and  batteries.  The  software  for  each 
vessel  was  written  in  C++  running  under  the  linux  operating  system.  Multi¬ 
ple  processes  manage  mission  planning,  navigation,  control  and  the  commu¬ 
nication  between  the  vessels. 


Fig.  3  a)  Experiment  site  at  Echo  Park  Lake,  Los  Angeles;  b)  Paths  followed  by  the 
vessels  during  the  first  experiment,  overlaid  on  the  environmental  map. 


The  mission  for  the  team  of  two  AS  Vs  was  to  visit  a  set  of  22  target 
locations  spread  in  the  Echo  Park  lake  in  Los  Angeles  (see  Figure  3).  Ad¬ 
ditionally,  a  communication  constraint  of  a  maximum  distance  of  60  m  was 
to  be  maintained,  and  collisions  between  the  vessels  and  with  external  static 
obstacles  (small  islands  present  in  the  lake)  were  to  be  avoided. 

The  GPS  coordinates  of  the  locations  to  be  visited  were  provided  to 
the  AS  Vs  at  the  beginning  of  the  experiment.  The  two  vessels  had  to  au¬ 
tonomously  navigate,  communicate  and  cooperate  to  visit  all  the  locations 
while  avoiding  multiple  visits  to  any  location  and  preserving  the  communica¬ 
tion  constraint.  Figures  3.b  and  4  show  the  paths  followed  by  the  two  vessels 
at  the  experimental  site  in  the  course  of  three  different  trials.  In  particular, 
Figure  3.b  shows  the  obstacles’  positions  and  the  safety  areas  of  the  obstacle 
avoidance  functions  (activated  only  when  inside  these  areas)  overlaid  on  the 
environmental  map.  In  this  trial  is  clear  that  all  the  locations  were  visited 
while  none  of  them  was  visited  multiple  times. 

Figure  5. a  shows  the  relative  distance  between  the  vessels  during  the 
experimental  trial  of  Fig.  3.b.  The  leader-formation  task  was  activated  by 
the  supervisory  modules  when  the  relative  distance  between  the  vessels  was 
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Fig.  4  Paths  followed  by  the  vessels  during  the  second  and  third  experimental  trials  at 
the  Echo  Park  Lake,  Los  Angeles 


greater  than  50  m.  While  the  relative  separation  was  below  this  value,  the  ves¬ 
sels  moved  independently.  Since  the  leader- follower  task  is  one-dimensional, 
the  null-space  projection  of  the  task  allows  the  system  to  attempt  achiev¬ 
ing  lower-priority  tasks,  thus,  the  follower  can  use  internal  movements  of  the 
leader-follower  task  (i.e. ,  movements  that  don’t  change  the  relative  distance 
between  the  vessels)  to  try  to  achieve  lower  priority  tasks  (e.g.,  move  toward 
the  target).  When  the  distance  between  the  vessels  exceeds  60m,  the  leader 
stops  to  wait  for  the  follower  moving  toward  it.  This  situation  arises  when  the 
follower  encounters  an  obstacle  along  its  path  which  activates  the  obstacle 
avoidance  task  with  highest  priority,  thus  causing  the  follower  to  lag. 

A  video  of  the  experimental  data  (GPS  and  compass  readings  of  the  ves¬ 
sels)  is  available  at  http://webuser.unicas.it/arrichiello/video/collabASV.mpg. 
The  video  shows  the  dynamic  behavior  of  the  vessels  while  reaching  the  lo¬ 
cations,  avoiding  collisions  and  obstacles.  Moreover,  it  shows  the  activation 
of  the  leader-follower  policy  (represented  by  a  change  of  color  of  the  vessels) 
when  out  form  the  communication  range. 

Figure  5.b  shows  the  desired  and  measured  norm  of  velocity  of  one  of 
the  vessels  during  the  first  400  s  of  the  mission.  It  is  worth  noticing  that 
the  requested  velocity  is  saturated  to  1  m/s, moreover  the  assigned  velocity 
decreases  close  to  0  when  reaching  the  assigned  location.  The  more  irregular 
behavior  after  the  first  300  s  is  due  both  to  the  leader- follower  task  and  to 
the  obstacle  avoidance  task. 

We  have  performed  several  mission  trials.  The  duration  of  each  trial  was 
between  13  and  14  minutes,  during  which  time  all  targets  were  visited,  no 
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Fig.  5  a)  Plot  of  the  relative  distance  between  the  vessels  during  three  field  trials.  The 
leader-follower  strategy  is  activated  when  the  distance  is  greater  than  50  m.  The  leader 
has  to  stop  and  wait  for  the  follower  when  their  distance  is  greater  than  60  m;  b)  Plot  of 
one  of  the  vessels’  desired  and  measured  velocity  during  the  first  400  s  of  an  experiment. 


target  was  visited  more  than  once,  and  the  communication  constraint  was 
respected.  The  same  mission  has  been  also  executed  with  a  single  vessel 
to  provide  a  speedup  benchmark.  The  mission  with  the  single  vessel  (with 
the  same  target  locations  as  in  the  two  vessel  case)  takes  approximately  20 
minutes,  thus  the  use  of  two  vessel  shows  a  30%  improvement  in  mission  time 
over  the  single  vessel  case.  While  the  present  system  works  well,  we  believe 
that  in  the  case  of  static  targets,  better  results  can  be  achieved  by  advance 
planning  the  target  assignments  in  order  to  optimize  the  execution  time.  Our 
focus  in  the  immediate  future  is  to  work  with  dynamic  settings  in  which 
obstacles  can  be  added  or  detected  during  the  mission  execution,  and  targets 
may  appear  or  disappear  during  run-time.  In  principle,  the  control  strategy 
proposed  here  will  be  effective  in  such  settings. 

5  Conclusion 

In  this  paper,  we  presented  a  collaborative  exploration  technique  for  a  team 
of  two  under- actuated  AS  Vs  designed  for  marine  biology  and  oceanography 
experiments.  The  control  approach,  based  on  a  behavior-based  technique  cou¬ 
pled  with  a  maneuvering  controller,  was  experimentally  tested  in  a  lake  in 
Los  Angeles.  The  mission  consisted  of  multiple  targets  that  had  to  be  visited 
while  maintaining  the  communication  link  between  the  vessels,  i.e.,  ensuring 
that  the  relative  distance  never  exceeded  a  threshold.  The  robots  successfully 
navigated  to  all  the  targets  while  avoiding  collisions  and  obstacles,  validating 
the  presented  control  approach.  In  the  future,  we  plan  to  use  the  presented 
techniques  for  large  scale  biological  sampling  experiments  in  marinas  and 
lakes  with  obstacles  with  two  or  more  AS  Vs.  To  extend  the  proposed  tech¬ 
nique  to  a  larger  team  of  AS  Vs  we  will  start  from  the  technique  proposed  in 
[5]  to  deal  with  Mobile  Ad-hoc  NETworks  (MANETs)  and  ensure  the  global 
connectivity  of  the  team  while  executing  the  assigned  mission. 
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Abstract  Autonomous  surface  craft  (ASC)  are  increasingly  attractive  as  a  means 
for  performing  harbor  operations  including  monitoring  and  inspection.  However, 
due  to  the  presence  of  many  fixed  and  moving  structures  such  as  pilings,  moorings, 
and  vessels,  harbor  environments  are  extremely  dynamic  and  cluttered.  In  order  to 
move  autonomously  in  such  conditions  ASC’s  must  be  capable  of  detecting  sta¬ 
tionary  and  moving  objects  and  plan  their  paths  accordingly.  We  propose  a  simple 
and  scalable  online  navigation  scheme,  wherein  the  relative  motion  of  surrounding 
obstacles  is  estimated  by  the  ASC,  and  the  motion  plan  is  modified  accordingly  at 
each  time  step.  Since  the  approach  is  model-free  and  its  decisions  are  made  at  a 
high  frequency,  the  system  is  able  to  deal  with  highly  dynamic  scenarios.  We  de¬ 
ployed  ASC’s  in  the  Selat  Pauh  region  of  Singapore  Harbor  to  test  the  technique 
using  a  short-range  2-D  laser  sensor;  detection  in  the  rough  waters  we  encountered 
was  quite  poor.  Nonetheless,  the  ASC’s  were  able  to  avoid  both  stationary  as  well 
as  mobile  obstacles,  the  motions  of  which  were  unknown  a  priori.  The  successful 
demonstration  of  obstacle  avoidance  in  the  field  validates  our  fast  online  approach. 

1  Introduction 

The  need  for  monitoring  and  securing  harbor  environments  has  grown  in  recent 
years,  as  a  result  of  increased  attention  to  pollution  from  runoff  or  other  sources, 
natural  processes  such  as  sediment  transport,  water  properties,  and  algal  blooms, 
as  well  as  security  against  threats.  Harbors,  with  high  density  of  goods,  vessels, 
and  people,  are  heavily  utilized  but  fragile  infrastructures.  Among  the  world’s  har¬ 
bors,  Singapore  Harbor  is  recognized  as  one  of  the  largest  in  terms  of  total  tonnage 
shipped  [10],  with  several  hundreds  of  large  ships  present  at  any  given  time.  At  the 
same  time,  the  city  of  Singapore  is  intimately  linked  with  the  harbor.  Any  devel¬ 
opment  on  land  directly  affects  the  harbor.  In  many  ways,  Singapore  represents  the 
most  important  and  difficult  worldwide  harbor  environment  for  monitoring. 

Autonomous  systems  are  now  at  the  level  of  maturity  that  they  can  be  brought  to 
bear  on  the  overall  needs  of  harbor  observation.  Autonomous  surface  craft  (ASC) 
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such  as  robotic  kayaks  are  particularly  well-suited  due  to  their  low  unit  cost  and 
high  loading  capacity;  such  ASC’s  can  be  used  in  extremely  shallow  waters,  where 
an  autonomous  underwater  vehicle  would  be  impractical  physically  and  acoustic 
navigation  would  be  difficult. 

Several  difficulties  dominate  autonomous  agents  in  harbor  environments.  First, 
harbors  have  numerous  structures  and  vessels  both  small  and  large  which  must  be 
detected.  The  smaller  vessels  may  not  use  the  Automatic  Identification  System,  or 
AIS,  to  broadcast  the  ship’s  data  and  are  prone  to  unexpected  maneuvers.  Larger  ves¬ 
sels,  while  presumably  easier  to  detect  at  large  distances,  cannot  realistically  take 
actions  to  avoid  hitting  an  ASC.  Numerous  underwater  or  near-surface  obstacles 
such  as  shipwrecks  are  common  in  harbors.  Above- water  structures  and  vessels  can 
also  endanger  communications  between  the  vehicles  and  other  parts  of  the  system. 
Secondly,  harbors  can  experience  strong  tides  and  tidal  currents,  which  are  often 
complicated  by  variable  bathymetry.  Currents  can  be  predicted  and  made  available 
to  operators,  but  sometimes  significant  deviations  occur,  perhaps  in  the  form  of  large 
eddies.  Autonomous  systems  have  to  be  able  to  develop  optimized  paths  and  adap¬ 
tive  actions  that  are  robust  against  such  disturbances.  In  this  paper,  we  describe  a 
series  of  tests  that  utilized  autonomous  kayaks  in  Singapore  Harbor  during  January 
2009,  with  a  focus  on  the  obstacle  avoidance  problem. 

Prior  Work  on  Obstacle  Avoidance  :  Local  reactive  obstacle  avoidance  tech¬ 
niques  [1,  2,  3]  have  been  quite  popular  due  to  their  simplicity  and  fast  computation. 
Some  works  [2,  3],  utilize  the  natural  robot-centric  polar  frame  to  choose  the  best 
direction  to  move.  While  these  algorithms  plan  in  position  space,  others  [11,4]  map 
the  obstacles  in  the  velocity  space  and  choose  suitable  control  parameters  to  satisfy 
kinematic  constraints.  Velocity  obstacles  (VO)  [6],  incorporate  the  dynamics  of  ob¬ 
stacle  motion  into  the  velocity  space.  A  common  way  to  handle  obstacle  motion  in 
known  environments  is  to  augment  the  configuration  space  by  a  time  axis  [5,  7]. 
When  exact  motion  of  the  obstacle  are  unknown,  predictive  techniques  [8,  9]  are 
used  to  identify  the  motion  parameters. 

Our  approach  is  similar  in  principle  to  the  VO  approach  except  that  it  is  for¬ 
mulated  in  position  space.  By  planning  in  a  relative  frame,  we  avoid  modeling  the 
kayak  and  the  obstacle  motion  individually  in  the  rough  sea.  A  simple  linear  pre¬ 
diction  based  on  the  immediate  history  is  used  to  determine  the  relative  obstacle 
velocity.  This  keeps  the  computation  load  light  as  the  algorithm  runs  at  a  high  fre¬ 
quency  and  helps  in  bounding  the  uncertainty  errors  at  each  step. 

2  Working  in  the  Singapore  Harbor  Environment 

Equipment  :  The  ASC’s  utilized  in  Singapore  Harbor  are  each  equipped  with  a 
GPS  receiver,  a  compass,  and  wireless  communications  gear  in  the  base  configura¬ 
tion.  To  support  the  aquisition  of  evironmental  data  a  number  of  other  sensors  were 
added.  A  Blueview  blazed  array  imaging  sonar  was  used  to  image  corals  and  ship¬ 
wrecks.  A  Velodyne  3-D  scanning  laser  imaged  above- water  structures  including 
an  oil  platform.  Also,  an  RDI  doppler  velocimeter  measured  the  ASC’s  speed  over 
ground.  Each  ASC  has  a  full-thrust  mission  duration  of  about  three  hours. 
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Fig.  1.  (a)  Selat  Pauh  operational  area,  (b)  A  range  sensor  on  the  ASC  is  used  for  data  acquisition 
in  real-time  preparation  for  traffic  avoidance 

For  obstacle  avoidance,  a  single  SICK  2-D  laser  scanner  was  utilized.  The  range 
is  250m  and  the  resolution  is  on  the  order  of  centimeters.  We  were  able  to  use  the 
full  10Hz  scan  rate  of  the  sensor  in  our  algorithm.  The  obstacle  avoidance  operation 
uses  the  onboard  GPS  and  compass  for  waypoint  navigation,  and  the  2-D  laser  for 
obstacle  detection.  In  the  remainder  of  this  section,  we  describe  several  operational 
issues  relevant  to  the  Singapore  Harbor  environment:  the  effect  of  strong  currents 
on  navigation  and  the  effect  of  waves  on  obstacle  detection. 

Effect  of  Ocean  Currents  on  Navigation  :  It  can  be  observed  that  ocean  currents 
greatly  affect  navigation.  The  currents  we  encountered  in  Singapore  Harbor  reached 
1.6m/s,  whereas  the  maximum  speed  through  water  of  the  kayaks  was  about  2.4m/s 
on  a  fully  charged  battery.  As  seen  in  Figure  2,  the  closeness  of  these  two  values 
means  that  a  simple  waypoint-following  controller  can  be  unsatisfactory  depending 
on  the  goal  of  the  mission.  Here  the  kayak  was  given  four  waypoints,  effectively 
defining  a  square  box  to  be  traversed.  The  controller,  developed  for  operation  in  low- 
current  conditions,  gives  the  kayak  one  of  the  desired  waypoints  to  travel  towards. 
Once  the  kayak  reaches  that  waypoint,  with  some  error  tolerance,  the  next  point  of 
the  square  becomes  the  desired  waypoint.  In  the  test  shown,  significant  currents  to 
the  southwest  have  deformed  several  of  the  vehicle  paths  up  to  fifteen  percent  of  the 
leg  length.  If  the  goal  of  the  mission  was  to  navigate  a  straight  line  for  data  sampling, 
this  current  effect  would  be  unsatisfactory,  and  a  controller  with  true  cross-track 
error  regulation  would  be  needed.  The  results  from  this  run  also  serve  to  motivate 
path  and  mission  planning  overall,  because  the  current  influences  the  amount  of  time 
and  energy  needed  to  complete  each  leg. 

Effect  of  Waves  on  Object  Detection  Rate  :  Although  utilizing  the  SICK  2- 
D  laser  has  many  benefits,  it  limitations  are  demonstrated  when  ocean  waves  are 
present.  The  unit  is  fixed  on  the  vehicle,  at  a  height  of  about  35cm  above  the  wa¬ 
terline.  Figure  2b  shows  the  projected  motion  of  a  single  laser  beam  fixed  to  the 
vehicle,  and  with  the  vehicle  heading  ranging  from  [0  —  90]°  .  The  beam  spends  a 
fraction  of  time  below  the  surface  of  the  water,  leading  to  no  return.  Other  points 
are  well  above  the  water  surface,  perhaps  yielding  a  return  from  the  superstructure 
of  a  vessel  instead  of  the  desired  hull.  Figure  3c  shows  the  overall  performance  of 
our  cluster-based  detection,  as  a  function  of  range.  In  relatively  calm  waters,  good 
hit  rates  can  be  found  at  about  half  the  specified  range  of  the  sensor,  but  in  waves 
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Fig.  2.  (a)  The  GPS  log  data  for  an  ASC  being  asked  to  navigate  the  perimeter  of  a  square  under 
waypoint  control.  The  distortion  of  the  paths  are  due  to  currents,  (b)  Projections  of  a  single  laser 
beam  onto  a  constant-radius  sheet  showing  the  effects  of  large  roll  and  pitch  motions  of  the  craft, 
at  about  1Hz. 


virtually  no  hits  are  obtained  outside  20m.  The  observations  are  based  only  on  the 
data  that  was  obtained  during  the  runs,  as  we  did  not  do  a  systematic  study  of  the 
sensor  clustering  characteristics  as  a  control  experiment.  We  note  that  mounting  the 
unit  on  a  gimbal  could  take  out  some  of  the  roll  and  pitch  effects  seen. 


3  Online  Navigation  Approach 

The  purpose  of  this  work  is  to  devise  a  motion  strategy  that  enables  safe  navigation 
along  a  desired  direction  for  an  ASC  using  only  local  2-D  range  readings  in  the 
presence  of  unknown  ocean  currents  and  surface  waves  from  nearby  boats.  At  each 
step  the  ASC  estimates  the  relative  obstacle  position  and  motion  and  subsequently 
chooses  a  direction  that  avoids  collision.  The  approach  is  to  follow  the  sense-plan- 
act  paradigm  at  each  step  at  a  high  frequency. 

The  basic  model  of  the  ASC  is  that  of  a  point  with  controllable  direction  and  a 
maximum  powered  velocity.  A  major  difference  between  our  application  and  ter¬ 
restrial  robotics  is  the  ability  of  the  uncertain  environmental  factors,  such  as  wind 
and  currents,  to  move  the  ASC  in  any  arbitrary  direction.  We  model  the  velocity 
vector  of  the  vehicle  Vasc  by  a  simple  superposition  of  the  velocity  arising  out  of 
environmental  factors,  V drift,  and  the  velocity  due  to  the  ASC’s  own  propulsion, 

V thrust  • 

Vase  =  V drift  T~  Vfhrust  (1) 

We  represent  the  world  in  terms  of  clusters,  which  are  determined  from  the  scan 
data  by  a  simple  thresholding  based  range  segmentation.  As  detected  from  the  laser 
data  range  and  angle  data,  any  one  obstacle  is  considered  to  be  a  single  cluster,  so 
that  it  has  a  starting  point  and  an  end  point.  Each  such  terminal  point  can  either 
be  an  occlusion  point  that  occludes  the  sensor’s  line  of  sight  visibility,  or  a  range 
point  which  is  the  limit  of  the  cluster  visible  due  to  the  sensor’s  range  limit  (  Fig¬ 
ure  4 a).  The  range  points  are  an  artifact  of  the  sensor  limitations  and  do  not  reflect 
information  about  the  obstacle. 

In  general,  the  occlusion  points  represent  the  shape  characteristics  of  the  silhou¬ 
ette  and  not  of  the  actual  object.  Due  to  this,  the  motion  of  the  occlusion  points 
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Fig.  3.  (a)  The  typical  schematic  environment  with  the  ASC  facing  north  and  surrounded  by  boats 
of  varying  size.  ( b )  The  local  information  available  to  the  ASC  using  the  2-D  laser.  The  ASC  must 
plan  its  path  while  avoiding  dynamic  obstacles,  (c)  The  fraction  of  boat  detections  per  second  vs. 
the  distance.  The  higher  curve  is  for  a  relatively  calm  day,  while  the  lower  curve  is  for  a  choppy 
day.  In  the  second  case,  detection  beyond  20m  was  impossible,  and  even  below  the  20m  mark,  it 
was  less  than  20%. 


(a)  (b)  ( c ) 


Fig.  4.  (< a )  Both  the  end  points  0\s  and  0\e  of  the  cluster  for  Obs\  are  occlusion  points  and  can 
be  used  as  reliable  features  in  a  short  time  duration.  R2E  is  a  range  point  and  does  not  provide  any 
distinctive  information  about  the  obstacle;  ( b )  Occlusion  point  travel  due  to  curvature.  0(T+AT) 
is  the  actual  point,  while  0^T+AT^  is  the  detected  point,  (c)  Occlusion  point  travel  due  to  visual 
discontinuity. 

do  not  exactly  represent  the  motion  of  the  object,  i.e.  the  rotational  motion  of  the 
object  can  change  the  shape  of  the  silhouette  and  give  the  occlusion  points  some  ve¬ 
locity.  However,  the  occlusion  points  can  act  as  distinctive  features  of  the  obstacle, 
however,  under  the  following  conditions: 

Low  obstacle  rotation  rate  :  In  open  water,  the  translational  velocity  of  many 
moving  objects  is  quite  high  compared  to  their  rotational  speed.  Due  to  this,  the 
velocity  of  the  occlusion  point  closely  approximates  the  linear  velocity  of  the 
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moving  objects: 


Vocc/asc  ^obs/asc  “1“  ®obs  *  rocc/obs  ~  T obs/asc 

Here  Vocc/asc  is  the  relative  velocity  of  the  occlusion  point  with  respect  to  the 
ASC,  Vofc/asc  is  the  actual  velocity  of  the  obstacle  with  respect  to  the  ASC,  coobs 
is  the  rotation  rate  of  the  boat,  and  rocc/obs  is  the  radius  vector  from  a  reference 
frame  on  the  obstacle  to  the  occlusion  point. 

Small  radius  of  curvature  :  The  occlusion  points  may  travel  along  the  physical 
object  surface  due  to  its  curvature.  The  distance  error  in  occlusion  point,  however, 
is  usually  much  smaller  than  the  actual  travel  if  the  radius  of  curvature  is  small 
compared  to  the  distance  from  the  sensor.  In  Figure  4 b  let  the  obstacle  move  as 
shown  relative  to  the  sensor.  If  the  radius  of  curvature  is  small,  the  actual  distance 
discrepancy  is  small,  and  Oj+atOt  ~  0't+atOt 
Limited  sharp  edges  :  Depending  on  the  inherent  shape  of  the  obstacle,  the  oc¬ 
clusion  points  may  jump  a  large  distance.  Figure  4c  shows  such  a  case.  This 
sudden  jump  gives  an  erroneous  measure  of  the  motion  of  the  obstacle.  Running 
the  detection  at  high  frequency  and  maintaining  a  short  motion  history  helps  the 
algorithm  recover  from  such  an  error  which  is  unavoidable  unless  the  obstacle  is 
fully  modeled. 

Following  these  assumptions  and  ignoring  the  obstacle  rotation,  the  obstacle 
motion  is  estimated  simply  as  the  average  motion  of  the  two  occlusion  points: 

Vobs  =  (Yocc,s  H-  Vocc,e) 

3.1  Navigation  Algorithm 

In  general,  the  ASC  has  an  underlying  objective  such  as  waypoint  navigation  which 
generates  a  desired  heading.  Our  algorithm  modifies  the  heading  command  in  light 
of  nearby  moving  obstacles  for  collision  avoidance.  The  higher  level  planning  that 
generates  the  desired  heading  command  is  responsible  for  avoiding  local  minima, 
as  the  local  reactive  approach  fails  to  address  it. 

We  plan  in  the  position  space  rather  than  velocity  space  due  to  the  unreliability 
in  velocity  measurements  of  the  ASC  as  well  as  the  obstacles.  Since  the  motion  of 
the  ASC  and  the  environment  are  not  modeled,  we  extrapolate  the  current  velocity 
measurements  in  a  simple  linear  model  for  a  short  duration  AT.  The  position  space 
in  the  planning  horizon  AT  becomes  the  reachable  set  Rat  which  is  the  set  of  all 
positions  that  the  ASC  can  reach  in  time  AT  using  this  linear  model.  Using  the 
simplified  motion  model  in  the  previous  section,  we  can  determine  directions  that 
will  cause  collision  with  nearby  obstacles.  Each  obstacle  corresponds  to  one  or  two 
continuous  sets  of  directions,  termed  as  forbidden  headings,  that  should  be  avoided. 
We  denote  the  forbidden  heading  for  a  given  obstacle  Obs ,  by  Hobs- 

Stationary  case  :  In  Figure  5,  Rat  shows  the  reachable  region.  The  optimal 
velocity  V*  vector  towards  the  goal  position  is  shown  in  Figure  5a.  The  obstacle  is 
represented  by  the  occlusion  points,  Oe  and  Os.  Let  the  ASC  have  a  bounding  radius 
of  Rasc-  To  accommodate  the  size  of  the  ASC,  we  extend  the  cluster  by  this  measure. 
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Fig.  5.  In  the  stationary  case,  the  dilation  of  the  obstacle  creates  forbidden  zones  in  the  heading  of 
the  ASC. 

Without  having  to  consider  the  whole  obstacle,  the  Minkowski  sum  to  represent  the 
obstacle  in  the  configuration  space  is  reduced  to  dilating  the  occlusion  points  by 
Rasc •  The  heading  that  the  ASC  must  avoid  in  order  to  prevent  collision  is  as  shown 
in  Figure  5b  by  the  arc  Hobs •  The  decision  of  moving  past  Oe  or  Os  is  made  by 
choosing  the  shortest  path  to  the  goal.  In  general,  the  final  choice  of  VthrUst  once  the 
Hobs  is  established  depends  on  the  mission  preferences.  The  corrected  ASC  heading 
is  taken  towards  the  corresponding  endpoint  of  Hobs  •  The  same  approach  holds  for 
multiple  obstacles  with  the  introduction  of  multiple  forbidden  regions. 

Dynamic  case  :  As  discussed  earlier  in  (Equation  1),  environmental  factors 
such  as  wind  and  current  can  introduce  an  additional  velocity  V drift  to  the  ASC. 
Also  many  of  the  obstacles  in  a  harbor-like  environment  are  mobile  contributing 
to  the  dynamic  environment  seen  by  the  craft.  The  velocities  of  these  obstacles  are 
unknown  a  priori  and  have  to  be  deduced  from  the  local  range  information.  Let 
us  take  the  case  of  a  single  obstacle  moving  with  unknown  velocity  V0bs ,  while 
the  ASC  drifts  with  the  velocity  V drift-  As  the  sensing  is  done  in  the  egocentric 
frame  of  the  ASC,  it  is  impossible  to  distinguish  between  these.  Let  Vext  represent 
the  uncontrollable  component  of  the  ASC  velocity  towards  the  obstacle,  i.e.  the 
combined  effect  of  the  ASC  drift  and  the  obstacle  motion  V0bs,  Vext  =  V drift  ~  VQbs- 
Note  that  the  ASC  can  only  control  V thrust  \  using  the  onboard  sensors  to  measure  the 
obstacle  velocity  would  give  us  Vext  +  Vthrust  -  The  net  ASC  velocity  with  respect  to 
the  obstacle  and  the  reachable  set  Rat,  is  then  given  by: 


Vase  jobs  —  Vext  T~  Vthrust 

Xt+AT  =  XT  Hb  Vext  A  T  +  Vthrust  A  T 

For  a  constant  estimate  of  Vext  in  the  duration  AT,  Rat  is  shown  in  Figure  6. 

The  choice  of  the  planning  horizon  AT  depends  on  two  factors:  accuracy  of  ve¬ 
locity  estimation,  and  the  distance  to  the  nearest  obstacle.  If  the  predicted  motion  is 
considered  reliable,  the  ASC  can  plan  for  a  much  longer  time  step  with  confidence. 
On  the  other  hand,  if  the  motion  model  is  highy  unpredictable  or  if  the  data  is  spo¬ 
radic,  it  is  advisable  to  plan  for  a  shorter  horizon.  Given  AT,  choosing  the  maximum 
Vthrust  is  usually  desirable  from  the  point  of  view  of  the  mission.  However,  in  cases 
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Fig.  6.  ( a&cb )  Velocity  of  the  ASC  with  respect  to  the  obstacle  Vasc/obs  is  the  vector  sum  of  un¬ 
controllable  velocity  components  affecting  the  obstacle  and  the  ASC,  and  the  controlled  speed. 
The  reachable  state  Rat  after  AT  is  the  circle  shown;  (c-e)  The  relative  velocity  of  the  ASC  with 
respect  to  the  obstacle  is  used  to  translate  the  reachable  set.  The  forbidden  heading  regions  are 
shown  in  red.  (c)  Scenario  when  \Vthrmt  \  <  \Vext\'->  id)  Scenario  when  \Vthrust  \  >  \Vext\'-,  (e)  Forbidden 
regions  for  multiple  moving  obstacles. 


where  the  obstacle  is  too  close,  Rat  is  further  bounded  by  the  minimum  distance  to 
the  obstacle  in  consideration,  i.e.,  Vthrust  =  minty thrust _maxi ditf  (asc,obs)/ AT). 

Estimating  Vext  •  From  our  velocity  definitions,  we  have  Vext  =  Vasc  —  VQhs  — 
Vthrust  where  Vasc  ~  Vobs  =  Vasc/0bs  =  —r.  Here,  r  is  the  range  vector  from  the  ASC 
to  the  obstacle,  and  r  is  the  vector  of  the  time  rates  of  change  of  r’s  components  over 
time.  Vthmst  is  estimated  from  the  thrust  command  on  the  vehicle  or  a  water  velocity 
sensor,  in  union  with  the  compass  heading.  In  the  absence  of  these  estimates,  the 
physical  thrust  can  be  briefly  turned  off,  forcing  VthrUst  =  0-  Note  that  this  use  of 
Vthrust  should  not  be  confused  with  the  circle  radius  in  defining  the  forbidden  an¬ 
gles  described  in  previously.  Here  it  is  used  to  describe  an  actual  measurement  or 
estimate  of  controlled  velocity. 


4  Experiments  at  Selat-Pauh 

Selat  Pauh  was  the  test  site  utilized  during  the  January  2009  experiments  ( Figure  1). 
Selat  Pauh  is  located  off  the  southern  coast  of  Singapore  where  a  significant  amount 
of  ship  traffic  is  seen.  In  addition,  the  site  has  numerous  stationary  structures,  such 
as  buoys  and  oil  rigs,  and  there  are  strong  current  fluctuations  daily.  Overall,  this 
area  is  ideal  for  testing  and  observing  the  harbor  environment. 

Using  the  theory  described,  a  number  of  obstacle  avoidance  tests  were  completed 
as  summarized  in  (Figure  8).  Current  predictions  provided  for  the  experiment  date, 
January  14,  and  a  deployment  photo  are  shown  in  Figure  7. 
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Fig.  7.  (a)  Selat  Pauh  Jan  14  ocean  current  forecast.  The  operational  area  is  shown  by  the  green 
polygon.  The  current  direction  is  shown  by  the  arrows,  and  the  color  shows  the  magnitude  (red  be¬ 
ing  of  the  order  1.6m/s).  Image  courtesy:  N.  Patrikalakis  (CENSAM).  (b)  SICK  Id- 1000  scanning 
laser  being  used  in  rough  waters  on  Jan  14,  for  obstacle  detection. 


Fig.  8.  (a)  Avoiding  a  stationary  boat  ( b )  Avoiding  a  moving  obstacle 

The  weather  conditions  were  challenging  and  tended  towards  strong  winds  and 
a  choppy  sea.  Detection  quality  was  on  a  par  with  that  shown  in  the  lower  curve 
of  Figure  3c,  and  this  explains  why  the  ASC  only  took  avoidance  action  at  a  short 
range.  However,  as  shown,  the  online  approach  was  successful  even  under  these 
dismal  circumstances.  The  algorithm  ran  at  10Hz,  on  a  Mini-ITX  with  1GB  RAM. 

In  Figure  8  the  GPS  position  logs  (the  blue  points)  of  the  ASC  are  broken  into 
two  time  sets  for  clarity.  The  red  points  are  the  laser  hits  from  the  ASC  plotted  in 
the  global  frame.  A  simple  waypoint  based  controller  was  used  to  navigate  from  the 
Start  to  Goal  with  known  GPS  locations.  Since  the  percentage  of  detection  is  so  low, 
20%,  we  averaged  the  laser  data  over  a  moving  window  of  lsec  before  applying  the 
obstacle  detection  algorithm,  improving  the  detection  rate  significantly. 

In  Figure  8 a  the  boat  was  kept  stationary.  We  see  that  initially  the  ASC  follows 
the  V*  direction  to  go  straight  towards  the  goal  before  it  detects  an  obstacle  at  a 
distance  of  about  20m.  The  ASC  motion  is  then  modified  to  go  around  the  boat 
and  as  soon  the  the  obstacle  is  safely  cleared,  it  executes  a  new  V*  direction.  In  the 
second  run  Figure  8b,  the  boat  actively  obstructs  the  path  of  the  ASC,  moving  from 
top  right  corner  of  the  plot  to  right  in  front  of  the  kayak,  from  the  side.  The  ASC 
detects  the  obstacle  and  modifies  its  motion  accordingly.  Such  close  range  dynamic 
obstacle  avoidance  requires  fast  online  algorithms  like  the  one  proposed. 
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5  Conclusion 

In  large-scale  autonomous  vehicle  testing  in  Singapore  Harbor,  we  have  found  that 
strong  currents  and  heavy  traffic  are  serious  robustness  concerns.  Autonomous  vehi¬ 
cles  need  to  have  more  available  speed  and  substantially  increased  energy  storage  in 
order  to  perform  meaningful  missions  in  these  waters.  Path  planning  for  known  cur¬ 
rent  and  robust  control  to  reject  unknown  currents  are  also  critical.  We  have  made 
specific  progress  in  obstacle  avoidance  which,  as  described  here,  is  appropriate  for 
day-to-day  use  to  avoid  fixed  and  slowly-moving  obstacles.  The  main  features  of 
our  algorithm  are  that  it  is  neither  probabilistic  nor  model-based,  and  that  it  is  posed 
in  position  space;  as  a  result,  it  scales  seamlessly  to  situations  with  many  objects, 
and  with  very  low  computational  cost.  In  turn,  our  simple  approach  requires  good 
confidence  in  the  range  data  and  obstacle  detection,  and  for  this  we  have  success¬ 
fully  employed  a  clustering  algorithm.  The  avoidance  behavior  is  demonstrated  for 
detection  rates  under  20%. 

In  future  work,  we  plan  to  test  avoidance  of  faster  moving  obstacles  and  to  use 
vessel  motions  reported  by  AIS.  The  algorithm  can  be  extended  to  formations,  and 
including  range  information  in  the  forbidden  regions  could  lead  to  addtional  trajec¬ 
tories  that  may  be  useful. 
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Trajectory  Design  for  Autonomous  Underwater 
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Abstract  Trajectory  design  for  Autonomous  Underwater  Vehicles  (AUVs)  is  of 
great  importance  to  the  oceanographic  research  community.  Intelligent  planning  is 
required  to  maneuver  a  vehicle  to  high-valued  locations  for  data  collection.  We  con¬ 
sider  the  use  of  ocean  model  predictions  to  determine  the  locations  to  be  visited  by 
an  AUV,  which  then  provides  near-real  time,  in  situ  measurements  back  to  the  model 
to  increase  the  skill  of  future  predictions.  The  motion  planning  problem  of  steering 
the  vehicle  between  the  computed  waypoints  is  not  considered  here.  Our  focus  is 
on  the  algorithm  to  determine  relevant  points  of  interest  for  a  chosen  oceanographic 
feature.  This  represents  a  first  approach  to  an  end  to  end  autonomous  prediction  and 
tasking  system  for  aquatic,  mobile  sensor  networks.  We  design  a  sampling  plan  and 
present  experimental  results  with  AUV  retasking  in  the  Southern  California  Bight 
(SCB)  off  the  coast  of  Los  Angeles. 

1  Introduction 

More  than  three-fourths  of  our  earth  is  covered  by  water,  yet  we  have  explored  less 
than  5%  of  the  aquatic  environment.  Autonomous  Underwater  Vehicles  (AUVs) 
play  a  major  role  in  the  collection  of  oceanographic  data.  To  make  new  discoveries 
and  improve  our  overall  understanding  of  the  ocean,  scientists  must  make  use  of 
these  platforms  by  implementing  effective  monitoring  and  sampling  techniques  to 
study  ocean  up  welling,  tidal  mixing  or  other  ocean  processes.  One  emerging  exam¬ 
ple  of  innovative  and  intelligent  ocean  sampling  is  the  automatic  and  coordinated 
control  of  autonomous  and  Lagrangian  sensor  platforms  [4]. 

As  complex  and  understudied  as  the  ocean  may  be,  we  are  able  to  model  and 
predict  certain  behaviors  moderately  well  over  short  time  periods.  Expanding  our 
modeling  capabilities,  and  general  knowledge  of  the  ocean,  will  help  us  better  ex¬ 
ploit  the  resources  that  it  has  to  offer.  Consistently  comparing  model  predictions 
with  actual  events,  and  adjusting  for  discrepencies,  will  increase  the  range  of  valid¬ 
ity  of  existing  ocean  models,  both  temporally  and  spatially. 
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The  goal  of  this  paper  is  to  present  an  innovative  ocean  sampling  method  that 
utilizes  model  predictions  and  AUVs  to  collect  interesting  oceanographic  data  that 
can  also  increase  model  skill.  Our  motivation  is  to  track  and  collect  daily  informa¬ 
tion  about  an  ocean  process  or  feature  which  has  a  lifespan  on  the  order  of  a  week. 
We  use  an  ocean  model  to  predict  the  behavior  of  an  interesting  artifact,  e.g.,  a  fresh 
water  plume,  over  a  small  time  period,  e.g.,  one  day.  This  prediction  is  then  used  as 
input  to  an  algorithm  that  determines  a  sampling  plan  for  the  AUV(s).  The  AUV(s) 
are  then  retasked  from  a  current  mission  or  deployed.  Afterward,  the  collected  data 
is  assimilated  into  the  ocean  model  and  an  updated  prediction  is  computed.  A  new 
sampling  plan  is  created  and  the  process  repeats  until  the  artifact  is  out  of  range  or 
is  no  longer  of  interest. 

We  motivate  the  work  from  an  oceanographic  perspective  and  provide  a  realistic 
field  application.  Next,  we  briefly  describe  the  ocean  model  and  AUV  used  in  this 
study.  We  discuss  the  waypoint  selection  algorithm  and  present  results  from  a  field 
implementation.  We  conclude  with  future  research  plans. 

The  work  presented  here  serves  as  a  proof  of  concept  for  the  utilization  of  ocean 
model  forecasts  to  design  sampling  missions  for  AUVs  in  particular,  and  aquatic 
mobile  sensor  platforms  in  general,  to  follow  an  ocean  feature  and  collect  data. 

2  Oceanography  Application  and  Ocean  Model 

Microscopic  organisms  are  the  base  of  the  food  chain:  all  aquatic  life  ultimately 
depends  upon  them  for  food.  There  are  a  few  dozen  species  of  phytoplankton  and 
cyanobacteria  that  can  create  potent  toxins  when  provided  with  the  right  conditions. 
Harmful  algal  blooms  (HABs)  can  cause  harm  via  toxin  production,  or  by  their 
accumulated  biomass.  Such  blooms  can  cause  severe  illness  and  potential  death  to 
humans  as  well  as  to  fish,  birds  and  other  mammals.  The  blooms  generally  occur 
near  fresh  water  inlets,  where  large  amounts  of  nutrient  rich,  fresh  water  is  deposited 
into  the  ocean.  This  water  provdes  the  excess  food  to  support  higher  productivity 
and  a  bloom  of  microorganisms.  It  is  of  interest  to  predict  when  and  where  HABs 
may  form,  and  which  coastal  areas  they  may  affect.  Harmful  algal  blooms  are  an 
active  area  of  research  along  the  western  coast  of  the  United  States  and  are  of  large 
concern  for  coastal  communities  in  southern  California.  The  impact  of  HABs  in  this 
region  can  be  seen  in  [7,  10].  With  this  motivation,  we  choose  fresh  water  plumes 
as  an  ocean  feature  for  which  to  design  predictive  tracking  missions. 

The  predictive  tool  utilized  in  this  study  is  the  Regional  Ocean  Model  Sys¬ 
tem  (ROMS)  [9]  -  a  split-explicit,  free-surface,  topography-following-coordinate 
oceanic  model.  We  use  ROMS  because  it  is  an  open  source  ocean  model  that  is 
widely  accepted  and  supported  throughout  the  oceanographic  and  modeling  com¬ 
munities.  Additionally,  the  model  was  developed  to  study  ocean  processes  along  the 
western  U.S.  coast  which  is  our  primary  area  of  study. 

Research  is  currently  ongoing  to  update  and  improve  ROMS  for  the  Southern 
California  Bight  (SCB)1  in  an  effort  to  characterize  and  understand  the  complex  up- 


1  The  SCB  is  the  oceanic  region  contained  within  32°  N  to  34.5°  N  and  —  1 17°  E  to  —121°  E 
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welling  and  current  structure  that  exist  and  drive  the  local  climate.  The  Jet  Propul¬ 
sion  Laboratory  (JPL)  uses  ROMS  to  provide  nowcasts  and  hourly  forecasts  (up  to 
36  hours)  for  Monterey  Bay,  the  SCB  and  Prince  William  Sound,  see  [6]  for  more 
information.  The  JPL  version  of  ROMS  assimilates  HF  radar  surface  current  mea¬ 
surements,  data  from  moorings,  satellite  data  and  any  data  available  from  AUVs 
operating  in  the  area.  Information  regarding  this  specific  version  of  ROMS  and  the 
data  assimilation  process  can  be  found  in  [3]. 

3  Mobile  Sensor  Platform:  AUV 

The  mobile  sensor  platform  used  in  this  study  is  a  Webb 
Slocum  autonomous  underwater  glider,  as  seen  in  Fig.  1. 
(http://www.webbresearch.com)  The  Slocum  glider  is  a  type  of 
AUV  designed  for  long-term  ocean  sampling  and  monitoring 
[8].  These  gliders  fly  through  the  water  by  altering  the  position 
of  their  center  of  mass  and  changing  their  buoyancy.  Due  to  this 
method  of  locomotion,  gliders  are  not  fast  moving  AUVs,  and 
generally  have  operational  velocities  on  the  same  order  of  mag¬ 
nitude  as  oceanic  currents.  The  endurance  and  velocity  charac¬ 
teristics  of  the  glider  make  it  a  good  candidate  vehicle  to  track 
ocean  features  which  have  movements  that  are  determined  by 
currents,  and  that  have  a  residence  time  on  the  order  of  weeks. 

We  utilize  autonomous  gliders  because  our  collaborative  re¬ 
search  group  owns  two  of  them,  and  hence  field  experiments 
can  be  readily  performed.  We  have  upgraded  the  communica¬ 
tion  capabilities  of  our  vehicles  to  take  advantage  of  our  local 
wireless  network;  details  on  this  can  be  found  in  the  concurrent 
article,  [5]. 

Extensive  research  has  been  done  on  glider  dynamics  and  controller  design,  e.g., 
see  [2]  and  the  references  therein.  Thus,  we  do  not  discuss  these  details  nor  the 
trajectory  along  which  the  glider  travels.  We  assume  here  that  the  glider  can  suc¬ 
cessfully  navigate  from  one  location  to  another. 

4  Trajectory  Design 

We  now  present  an  algorithm  which  generates  the  locations  for  the  AUV  to  visit  to 
follow  the  general  movements  of  a  fresh  water  plume  through  the  ocean. 

Considerable  study  has  been  reported  on  adaptive  control  of  single  gliders  and 
coordinated  multi-glider  systems,  see  for  example  [4]  and  the  included  references. 
In  these  papers,  the  trajectories  given  to  the  gliders  were  fixed  patterns  (rounded 
polygons)  that  were  predetermined  by  a  human  operator.  The  adaptive  control  com¬ 
ponent  was  implemented  to  keep  the  gliders  in  an  optimal  position,  relative  to  the 
other  gliders  following  the  same  trajectory.  The  difference  between  the  method  used 
in  [4]  and  the  approach  described  here,  is  that  here  the  sampling  trajectory  is  deter¬ 
mined  by  use  of  the  output  of  ROMS,  and  thus  is,  at  first  glance,  a  seemingly  random 


Fig.  1  He  Ha  Pe, 
one  of  two  USC 
Slocum  gliders,  fly¬ 
ing  a  mission  off  the 
coast  of  Catalina  Is¬ 
land. 
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and  irregular  sampling  pattern.  Such  an  approach  is  a  benefit  to  the  model  and  scien¬ 
tist  alike.  Scientists  can  identify  sampling  locations  based  upon  ocean  measurements 
they  are  interested  in  following,  rather  than  setting  a  predetermined  trajectory  and 
hoping  the  feature  enters  the  transect  while  the  AUV  is  sampling.  Model  skill  is 
increased  by  the  continuous  assimilation  of  the  collected  data;  which  by  choice,  is 
not  a  continuous  measurement  at  the  same  location. 

For  a  fresh  water  plume,  the  low  salinity  and  density  imply  that  this  feature  will 
propagate  through  the  ocean  driven  primarily  by  surface  currents.  A  plume  may 
dissipate  rapidly,  but  can  stay  cohesive  and  detectable  for  up  to  weeks;  we  assume 
the  later  case.  It  is  of  interest  to  track  these  plumes  based  on  the  discussion  in  Sect.  2 
as  well  as  in  [1].  In  addition  to  tracking  the  plume,  it  is  also  important  to  accurately 
predict  where  a  plume  will  travel  on  a  daily  basis.  The  ROMS  prediction  capabilities 
for  a  plume  are  good,  but  model  skill  can  significantly  increase  from  assimilation  of 
in  situ  measurements. 

A  single  Slocum  glider  is  not  optimal  for  the  task  at  hand,  as  it  is  built  for  en¬ 
durance  missions  and  traveling  at  low  velocities.  Hence,  we  can  not  expect  it  to  be 
able  to  collect  samples  over  the  entire  area  of  a  potentially  large  plume.  Thus,  we 
restrict  ourselves  to  visiting  (obtaining  samples  at)  at  most  two  locations  for  each 
hour  of  sampling.  The  primary  location  that  we  are  interesed  in  tracking  is  the  cen¬ 
troid  of  the  plume  extent;  analogous  to  the  eye  of  the  storm.  Optimally,  we  would 
also  like  to  gather  a  sample  on  the  boundary  of  the  plume.  However,  the  glider  may 
not  be  able  to  reach  the  plume  centroid  and  a  point  on  the  boundary  in  a  one  hour 
time  frame.  With  the  given  mission  and  the  tools  at  hand,  we  present  the  following 
trajectory  design  algorithm. 

4.1  Trajectory  Design  Algorithm  Based  on  Ocean  Model 
Predictions 

We  propose  the  following  iterative  algorithm  for  plume  tracking  utilizing  ocen 
model  predictions.  This  is  the  first  known  presentation  of  such  a  technology  chain, 
and  as  such,  is  presented  in  a  simplified  manner.  First,  we  assume  the  glider  trav¬ 
els  at  a  constant  velocity  v.  Let  d  be  the  distance  in  kilometers  that  the  vehicle  can 
travel  in  a  given  time.  We  neither  consider  vehicle  dynamics  nor  the  effect  of  ocean 
currents  upon  the  vehicle  in  this  study;  these  are  areas  of  ongoing  research.  Also,  we 
only  consider  a  2-D  planar  problem  as  far  as  the  waypoint  computation  is  concerned. 

The  input  to  the  trajectory  design  algorithm  is  a  set  of  points,  Eft  (referred  to  as 
drifters)  that  determine  the  initial  extent  of  the  plume,  and  hourly  predictions  of  the 
location  of  each  point  in  3)  for  a  set  duration.  For  the  points  in  Q),  we  compute  the 
convex  hull  as  the  minimum  bounding  ellipsoid,  Eq.  The  centroid  of  this  ellipsoid, 
Co,  is  the  start  point  of  the  survey.  Next,  we  consider  the  predicted  locations  of  Of 
after  one  hour,  .  The  centroid  of  is  C\ ;  the  centroid  of  the  minimum  bounding 
ellipsoid  E\.  The  algorithm  computes  dg(Co,Ci),  the  geographic  distance  from  Co 
to  Ci.  Given  upper  and  lower  bounds  du  and  d/,  resp.,  if  d/  <  dg(Co,Ci)  <  dM,  the 
trajectory  is  simply  defined  as  the  line  CoCi.  If  dg(Co,Ci)  <  d/,  the  algorithm  first 
checks  to  see  if  there  exists  a  point  p  E  E\  U  such  that 
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di<dg(Co,p)+dg{C\,p)  <du.  (1) 

If  such  a  point  exists,  the  trajectory  is  defined  as  the  line  Cop  followed  by  the  line 
pC\ .  If  the  set  of  points  p  G  E\  U  @)\  which  satisfy  Eq.  1  is  empty,  then  the  algorithm 
computes  the  locus  of  points,  j£f  =  {p*  G  | dg(Co,p)  +dg(p,C\)  =d}.  This  locus 
Jzf ,  by  definition,  defines  an  ellipse  with  focii  Co  and  C\ .  We  then  choose  a  random 
point  p*  G  j£f  as  another  location  for  sampling.  Here,  the  trajectory  is  the  line  Cop* 
followed  by  the  line  p*C\.  If  dg(Co,Ci)  >  du ,  the  algorithm  aborts  as  the  plume  is 
traveling  too  fast  for  the  chosen  vehicle.  The  algorithm  then  repeats  this  process  for 
the  defined  duration  of  tracking.  This  selection  process  of  waypoints  for  the  AUV  to 
visit  to  track  the  plume  is  presented  in  Algorithm  1 .  The  overall  iterative  process  to 


Algorithm  1  Waypoint  Selection  Algorithm  Based  on  Ocean  Model  Predictions 
Require:  Hourly  forecasts,  %  for  a  set  of  points  defining  the  initial  plume  condition  and  its 
movement  for  a  period  of  time,  T . 

for  0  <  i  <  T  do 

Compute  Ci,  the  centroid  of  the  minimum  bounding  ellipsoid  Ei  of  the  points 

end  for 

while  0  <  i  <  T  —  1  do 

if  di  <  dg(Ci,Ci+ 1)  <  du  then 
The  trajectory  is  QQ+i- 

else  if  dg(Ci,Ci+\)  <  di  and  3p  G  Ei  U  %  such  that  di  <  dg(Q,p)  -\-dg(p,Q+i)  <  du.  then 
The  trajectory  is  Cip  followed  by  pC[+  \ . 

else  if  dg(Ci,Q+ 1)  <  di  and  {p  e  Ei  U  %\di  <  dg(Q,p)  +dg(p,Ci+ 1)  <  du}  =  0.  then 
Compute  Jz?  =  {p*  G  Jit? \dg(Co,p)  +  dg(p,C\)  =  d},  select  a  random  p*  G  Jzf  and  define 
the  trajectory  as  Qp*  followed  by  1 
else  if  dg(Ci,Ci+] )  >  du  then 

Stop  the  algorithm.  The  plume  is  moving  too  fast  for  the  selected  AUV. 

end  if 
end  while 


design  an  implementable  plume  tracking  strategy  based  on  ocean  model  predictions 
is  given  in  Algorithm  2. 


Algorithm  2  Ocean  Plume  Tracking  Algorithm  Based  on  Ocean  Model  Predictions 
Require:  A  significant  fresh  water  plume  is  detected  via  direct  observation  or  remotely  sensed 
data  such  as  satellite  imagery. 

repeat 

A  set  of  points  is  chosen  which  determine  the  current  extent  of  the  plume. 

Input  to  ROMS. 

ROMS  produces  an  hourly  forecast  for  all  points  in  Q>. 

Input  hourly  forcast  for  Q)  into  the  trajectory  design  algorithm. 

Execute  the  trajectory  design  algorithm  (see  Sect.  1). 

Uploaded  computed  waypoints  to  the  AUV. 

AUV  executes  mission. 

The  AUV  sends  collected  data  to  ROMS  for  assimilation  into  the  model, 
until  Plume  dissipates,  travels  out  of  range  or  is  no  longer  of  interest. 
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Remark  1.  In  the  SCB,  a  vertical  velocity  profile  of  ocean  current  is  generally  not 
constant.  Since  the  plume  propagates  on  the  ocean  surface  (1  —  3  m)  and  the  glider 
operates  at  depths  of  60—100  m,  it  is  not  valid  to  assume  that  they  are  subjected  to 
the  same  current  regime,  in  both  velocity  and  direction.  Thus,  it  may  be  possible  for 
a  plume  to  outrun  a  slow-moving  vehicle  (i.e.,  dg(Q,Ci+i)  >  du). 

5  Implementation  and  Field  Experiments  in  the  SCB 

The  rainy  season  in  southern  California  runs  from  November  to  March.  During 
this  time,  storm  events  cause  large  runoff  into  local  area  rivers  and  streams,  all 
of  which  empty  into  the  Pacific  Ocean.  Two  major  rivers  in  the  Los  Angeles  area, 
the  Santa  Ana  and  the  Los  Angeles  River,  input  large  fresh  water  plumes  to  the 
SCB.  Such  plumes  have  a  high  liklihood  of  producing  HAB  events.  We  deployed  a 
Webb  Slocum  glider  into  the  SCB  on  February  17,  2009  to  conduct  a  month-long 
observation  and  sampling  mission.  For  this  deployment,  the  glider  is  programmed 
to  execute  a  zig-zag  pattern  mission  along  the  coastline,  as  depicted  in  Fig.  2,  by 
navigating  to  each  of  the  six  waypoints  depicted  by  the  red  and  black  bullseyes. 
Figure  2  also  delineates  the  20  m  and  30  m  isobaths,  given  by  the  green  and  red 
lines,  respectively. 

Unfortunately,  weather  and  remote  sensing  devices  did  not  cooperate  to  produce 
a  rain  event  along  with  a  detectable  fresh  water  plume,  so  we  were  unable  to  retask 
the  glider  to  track  a  real  plume  by  use  of  Algorithm  2.  Instead,  we  defined  a  pseudo¬ 
plume  Of  with  15  initial  drifter  locations  to  demonstrate  the  proof  of  concept  of  this 
research.  The  pseudo  plume  is  given  by  the  blue  line  in  Fig.  3. 


Fig.  2  Preset  waypoints,  depicted  with  red  and  Fig.  3  Plume  (blue  line),  computed  waypoints 
black  bullseyes  and  the  intended  path  of  the  (yellow  diamonds),  and  path  connecting  con- 
glider  given  by  the  magenta  line.  Image  created  secutive  waypoints  (black  line).  Image  created 
by  use  of  Google  Earth.  by  use  of  Google  Earth. 


The  set  Q)  was  sent  to  JPL  and  input  to  ROMS  as  the  initial  plume  condition. 
The  locations  of  the  points  in  were  predicted  for  15  hours.  The  initial  time  and 
location  for  the  beginning  of  this  retasking  experiment  coincided  with  predicted 
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coordinates  of  a  future  glider  communication.  The  pseudo-plume  was  chosen  such 
that  Co  was  near  this  predicted  glider  surfacing  location. 

Based  on  observed  behavior  for  our  vehicle  during  this  deployment,  we  take 
v  =  0.75  km/h,  and  initially  defined  J/  =  0.5  and  du  =  0.8.  The  hourly  predictions 
were  input  to  the  trajectory  design  algorithm  and  a  tracking  strategy  was  generated. 
Due  to  slow  projected  surface  currents  in  the  area  of  study,  the  relative  movement 
of  the  plume  was  quite  small.  To  keep  the  glider  from  surfacing  too  often  and  to 
generate  a  more  implementable  trajectory,  we  opted  to  omit  visiting  consecutive 
centroids.  Instead,  we  chose  to  begin  at  the  initial  centroid,  then  visit  the  predicted 
centroid  of  the  plume  after  five,  ten  and  15  hours,  C5,  C10  and  C15,  respectively. 
Between  visiting  these  sites,  the  algorithm  computed  an  additional  waypoint  for  the 
glider  to  visit.  These  intermediate  waypoints  were  chosen  similarly  to  the  p*  defined 
earlier,  with  d  =  3.75;  the  distance  the  glider  should  travel  in  five  hours.  This  design 
strategy  produced  seven  waypoints  for  the  AUV  to  visit  during  the  15  hour  mission. 
The  waypoints  are  presented  in  Table  1 . 

Note  that  we  include  the  initial  centroid  as  a  waypoint,  since  the  glider  may 
not  surface  exactly  at  the  predicted  location.  Upon  visiting  all  of  the  waypoints  in 
Table  1,  the  glider  was  instructed  to  continue  the  sampling  mission  shown  in  Fig. 
2.  Figure  3  presents  a  broad  overview  of  the  waypoints  in  Table  1,  along  with  a 
path  connecting  consecutive  waypoints.  The  plume  is  delineated  by  the  blue  line  and 
the  waypoints  are  numbered  and  depicted  by  yellow  diamonds.  Note  that  the  glider 
did  not  travel  on  the  ocean  surface  during  this  experiment.  Between  waypoints,  the 
glider  submerges  below  a  set  depth  and  performs  consecutive  dives  and  ascents 
creating  a  sawtooth- shaped  trajectory  as  its  glide  path. 


Table  1  Waypoints  generated  by  the  plume  tracking  algorithm.  Waypoint  numbers  1,3,5  and  7 
are  the  predicted  centroids  of  the  pseudo-plume  at  hours  0, 5, 10  and  15,  respectively. 


Number 

Latitude  (N) 

Longitude  (E) 

Number 

Latitude  (N) 

Longitude  (E) 

1 

33.6062 

-118.0137 

5 

33.6189 

-118.0349 

2 

33.6054 

-118.0356 

6 

33.6321 

-118.0257 

3 

33.6180 

-118.0306 

7 

33.6175 

-118.0361 

4 

33.6092 

-118.0487 

6  Results 

In  the  study  of  path  planning  for  field  robots,  planning  the  trajectory  is  usually  less 
than  half  the  battle,  the  real  challenge  comes  in  the  implementation.  This  is  exagger¬ 
ated  when  dealing  with  underwater  robots  due  to  the  complex  environment.  Next, 
we  present  results  of  an  implementation  of  the  designed  sampling  mission  onto  a 
Slocum  glider  operating  in  the  SCB. 

The  waypoints  given  in  Table  1  were  computed  under  the  assumption  that  the 
mission  would  be  loaded  onto  the  glider  at  a  specific  time  and  approximate  geo¬ 
graphic  location.  The  glider  arrived  and  communicated  at  the  correct  time  and  loca¬ 
tion,  however,  communication  was  aborted  before  the  plume  tracking  mission  could 
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be  uploaded.  We  were  able  to  establish  a  connection  two  hours  later  at  a  different 
location,  and  successfully  upload  the  mission  file;  this  location  is  the  red  droplet 
labeled  1  in  Fig.  4.  We  opted  to  not  visit  waypoint  1  based  on  the  location  of  the 
glider  and  to  get  the  glider  back  on  schedule  to  track  the  plume.  Figures  4  and 
5  present  magnified  images  of  Fig.  3,  where  computed  waypoints  are  the  yellow 
diamonds  and  the  red  droplets  are  the  actual  locations  visited  by  the  glider. 


Fig.  4  Computed  waypoints  (yellow  diamonds)  and  actual  glider  locations  (red  droplets).  Image 
created  by  use  of  Google  Earth. 

We  were  able  to  sucessfully  generate  a  plan  and  retask  a  deployed  glider  to  follow 
an  ocean  feature  for  15  hours.  It  is  clear  from  the  data  that  consideration  has  to  be 
made  for  glider  dynamics  and  external  forces  from  the  ocean  in  the  trajectory  design 
algorithm.  This  is  an  area  of  active  research.  The  motivation  of  this  research  is  to 
follow  plumes  through  the  ocean  via  centroid  tracking. 

One  element  that  we  have  neglected  to  discuss  up  to  this  point  is  that  we  have 
no  metric  for  comparison.  In  particular,  when  we  reach  a  predicted  centroid,  we 
do  not  have  a  method  to  check  whether  or  not  the  plume  centroid  was  actually 
at  that  location.  We  are  planning  experiments  to  deploy  actual  Lagrangian  drifters 
to  simulate  a  plume.  This  will  give  a  comparison  between  the  ROMS  prediction 
and  the  actual  movement  of  the  drifters.  This  also  provides  a  metric  to  determine 
the  accuracy  of  the  prediction  and  the  precision  of  the  AUV.  Another  component 
omitted  from  earlier  discussion  is  time.  When  tracking  a  moving  feature,  a  prediced 
waypoint  contains  time  information  as  well  as  location.  For  this  implementation,  the 
glider  began  the  mission  at  0302Z  and  ended  at  1835Z;  a  total  time  of  15.55  hours. 
Due  to  external  influences,  arrival  at  a  few  waypoints  was  not  at  the  predicted  time. 
Resolving  this  matter  is  contained  within  the  addition  of  external  forces,  and  is  the 
subject  of  ongoing  work. 
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Fig.  5  Computed  waypoints  (yellow  diamonds)  and  actual  glider  locations  (red  droplets).  Image 
created  by  use  of  Google  Earth. 


7  Conclusions  and  Future  Work 

Designing  effective  sampling  strategies  to  study  ocean  phenomena  is  a  challeng¬ 
ing  task  and  can  be  approached  from  many  different  angles.  Here,  we  presented  a 
method  to  exploit  multiple  facets  of  technology  to  achieve  our  goal.  Utilizing  an 
ocean  model  and  an  AUV,  we  were  able  to  construct  a  technology  chain  which  out¬ 
puts  a  path  to  follow  a  fresh  water  plume  centroid  for  a  chosen  period  of  time.  The 
successful  field  experiment  presented  here  required  the  cooperation  and  communi¬ 
cation  between  many  individuals.  Retasking  an  autonomous  glider  remotely  while 
it  is  in  the  field  involves  patience,  determination  and  many  resources.  In  a  period  of 
less  than  two  hours,  we  were  able  to  decide  to  retask  the  glider,  delineate  a  plume  in 
the  ocean,  use  ROMS  to  generate  a  prediction,  generate  an  implementable  tracking 
strategy,  create  a  glider  mission  file  and  have  it  ready  to  upload  to  the  glider.  This 
paper  has  demonstrated  that  we  have  implemented  the  collaboration  and  technology 
chain  required  to  perform  complex  field  experiments.  The  work  now  is  to  improve 
upon  the  waypoint  generation  algorithm  and  extend  it  to  design  implementable  3-D 
trajectories. 

The  main  implementation  issue  is  the  ability  of  the  glider  to  accurately  navi¬ 
gate  to  a  given  waypoint.  This  is  a  direct  result  of  the  waypoint  selection  algorithm 
only  solving  the  2-D  problem,  and  ignoring  the  dynamics  of  the  glider  and  the  com¬ 
plex  ocean  environment.  Details  on  how  to  implement  robustness  and  generate  more 
complex  sampling  missions  are  outside  the  scope  of  this  paper.  Areas  of  ongoing  re- 
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search  include  plans  to  incorporate  the  kinematic  and  dynamic  models  of  the  glider 
and  extend  this  from  a  planar  to  a  3-D  motion  planning  algorithm.  Also,  we  plan 
to  incorporate  a  3-D  current  output  of  ROMS  to  plan  a  trajectory  that  exploits  the 
currents  to  aid  the  locomotion  of  the  glider.  A  more  immediate  step  is  to  incorpo¬ 
rate  multiple  AUVs,  which  leads  to  the  development  of  an  optimization  criterion  on 
which  vehicle  is  best  suited  for  a  certain  mission  or  to  visit  a  chosen  waypoint.  A 
long-term  goal  is  to  facilitate  autonomy  for  the  entire  system,  leaving  the  human  in 
the  control  loop  as  a  fail-safe. 
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AUV  Benthic  Habitat  Mapping  in  South  Eastern 
Tasmania 
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Abstract  This  paper  describes  a  two  week  deployment  of  the  Autonomous  Under¬ 
water  Vehicle  (AUV)  Sirius  on  the  Tasman  Peninsula  in  SE  Tasmania  and  in  the 
Huon  Marine  Protected  Area  (MPA)  to  the  South  West  of  Hobart.  The  objective  of 
the  deployments  described  in  this  work  were  to  document  biological  assemblages 
associated  with  rocky  reef  systems  in  shelf  waters  beyond  normal  diving  depths.  At 
each  location,  multiple  reefs  were  surveyed  at  a  range  of  depths  from  approximately 
50  m  to  100  m  depth.  We  illustrate  how  the  AUV  based  imaging  complements  ben¬ 
thic  habitat  assessments  to  be  made  based  on  the  ship-borne  swath  bathymetry.  Over 
the  course  of  the  10  days  of  operation,  19  dives  were  undertaken  with  the  AUV 
covering  in  excess  of  70  linear  kilometers  of  survey  and  returning  nearly  160,000 
geo-referenced  high  resolution  stereo  image  pairs.  These  are  now  being  analysed  to 
describe  the  distribution  of  benthic  habitats  in  more  detail. 


1  Introduction 

The  Autonomous  Underwater  Vehicle  (AUV)  Sirius  was  part  of  a  two  week  expe¬ 
dition  in  October,  2008,  whose  objective  was  to  describe  biological  assemblages 
associated  with  rock  reef  systems  in  deep  shelf  waters  on  the  Tasman  Peninsula 
in  SE  Tasmania  and  in  the  Huon  Marine  Protected  Area  (MPA)  to  the  South  West 
of  Hobart.  Detailed  multibeam  sonar  bathymetry  data  were  previously  collected  by 
Geoscience  Australia  using  a  Simard  EM3002  multibeam  sonar  system,  Applannix 
motion  sensor  and  C-Nav  GPS  to  provide  high-resolution  Digital  Elevation  Maps 


Stefan  B.  Williams,  Oscar  Pizarro  and  Michael  Jakuba 

Australian  Centre  for  Field  Robotics,  Uni.  of  Sydney,  Sydney,  NSW  2006,  Australia 
e-mail:  stefanw,  o.pizarro,  m.jakuba@acfr.usyd.edu.au 


Neville  Barrett 

Tasmanian  Aquaculture  and  Fisheries  Institute,  Uni.  of  Tasmania,  Hobart,  Tas  7001,  Australia 
e-mail :  N eville.B arrett @  utas . edu.au 


1 


2 


Stefan  B.  Williams,  Oscar  Pizarro,  Michael  Jakuba,  Neville  Barrett,  Paper-ID  48 


(a)  Tasman  Peninsula  (b)  AUV  Sirius 


Fig.  1  (a)  The  site  of  survey  work  on  the  Tasman  Peninsula  in  the  South  East  of  Tasmania,  (b)  The 
vehicle  on-board  the  R/V  Challenger  prior  to  deployment.  The  dolerite  cliffs  of  the  peninsula  can 
be  seen  in  the  background. 


(DEMs)  of  the  study  areas.  The  DEMs  were  used  to  determine  suitable  AUV  survey 
locations  and  to  identify  any  hazards  to  operation.  At  each  location,  multiple  reefs 
were  surveyed  at  a  range  of  depths  from  approximately  50  m  to  100  m  depth.  Where 
distinct  ectones  (e.g.  reef  to  sand)  are  present,  transects  were  designed  to  cross  tran¬ 
sition  zones  and  help  determine  the  uniqueness  of  ectonal  assemblages.  Replication 
depended  upon  site  logistics,  however,  dive  profiles  were  designed  to  provide  suf¬ 
ficient  replication  to  quantitatively  determine  abundances  of  key  species/features 
within  depth  strata,  within  reefs,  between  reefs  (km  to  100  km  scale),  and  between 
differing  levels  of  reef  complexity. 


2  Biodiversity  Hub 

While  inshore  reef  systems  are  relatively  easy  to  access  and  describe  using  meth¬ 
ods  such  as  dive  surveys,  offshore  systems  have  remained  relatively  unknown  be¬ 
cause  of  the  expense  and  complexity  of  available  survey  methods.  A  recent  and 
very  significant  development  contributing  to  our  understanding  of  the  physical  en¬ 
vironment  of  shelf  habitats  has  been  multibeam  sonar  and  the  interpretation  of  its 
associated  backscatter.  This  has  opened  up  opportunities  for  developing  predictive 
capacity  in  this  field  where  matching  biological  datasets  are  available.  However,  the 
effectiveness  of  this  technique  has  yet  to  be  fully  tested  as  an  appropriate  surro¬ 
gate  for  predicting  patterns  of  biodiversity  because  of  the  lack  of  matching  biolog¬ 
ical  datasets  collected  at  the  same  spatial  scales  and  locations  as  fine  scale  acous- 
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tic  surveys.  In  this  context,  the  AUV  deployments  reported  on  here  were  part  of 
a  multi-disciplinary  experimental  program  in  eastern  Tasmania,  where  the  analy¬ 
sis  of  covariance  is  to  be  undertaken  on  co-located  fine-resolution  seabed  habitat 
data,  provided  by  the  EM3002  multibeam  sonar  coverages,  and  biological  datasets 
collected  at  similar  spatial  scales  by  Remotely  Operated  Vehicles  (ROVs),  Baited 
Underwater  Video  systems  (BRUVs),  towed  video  and  the  AUV  This  research  is 
being  undertaken  by  the  University  of  Tasmania  and  Geoscience  Australia  as  part 
of  the  Marine  Biodiversity  Research  Hub,  which  is  a  collaborative  program  funded 
under  the  Commonwealth  Government’s  Commonwealth  Environmental  Research 
Facilities  (CERF)  Program.  Within  this  Hub,  several  interrelated  projects  are  de¬ 
signed  to  develop  and  test  appropriate  surrogates  for  biodiversity  and  incorporate 
these  into  an  advanced  predictive  framework  that  covers  a  range  of  spatial  scales 
[2]. 

The  AUV  data  enables  a  finer- scale  coupling  of  biological  datasets  with  multi¬ 
beam  bathymetry  than  data  collected  through  the  use  of  ROVs,  BRUVs  and  towed 
video  alone,  because  of  geo-referencing  errors  associated  with  USBL  tracking  sys¬ 
tems.  This  additional  data  is  expected  to  allow  scale  matching  errors  to  be  examined 
in  more  detail  and  allow  surrogacy  to  be  examined  at  the  finest  possible  scale.  Ul¬ 
timately,  CERF  researchers  expect  to  be  able  to  compare  the  relative  efficiency  of 
using  AUV,  ROV  and  towed  video  systems  for  shelf  habitat  biological  surveys.  In 
addition,  the  high  resolution  images  produced  by  the  AUV  are  expected  to  signif¬ 
icantly  enhance  the  ability  to  identify  taxa,  adding  finer  taxonomic  resolution,  and 
hence  value,  to  the  data  collection. 


3  AUV-based  Benthic  Habitat  Mapping 

One  of  the  key  features  of  the  present  cruise  was  the  availability  of  a  high  resolution 
optical  imaging  AUV.  The  high  spatial  resolution  and  capacity  to  geo-reference  the 
resulting  imagery  provides  an  invaluable  mechanism  for  observing  the  extent  and 
composition  of  particular  benthic  habitats.  In  this  case,  these  data  allow  for  post 
cruise  analysis  to  validate  habitat  classification  based  on  backscatter  and  slope  data 
extracted  from  the  ship-borne  multibeam  bathymetry. 

AUVs  are  becoming  significant  contributors  to  modem  oceanography,  increas¬ 
ingly  playing  a  role  as  a  complement  to  traditional  survey  methods.  Large,  fast  sur¬ 
vey  AUVs  can  provide  high  resolution  acoustic  multibeam  and  sub-bottom  data  by 
operating  a  few  tens  of  meters  off  the  bottom,  even  in  deep  water  [6,  10].  High 
resolution  optical  imaging  requires  the  ability  to  operate  very  close  to  potentially 
mgged  terrain.  The  Autonomous  Benthic  Explorer  (ABE)  has  helped  increase  our 
understanding  of  spreading  ridges,  hydrothermal  vents  and  plume  dynamics  [16] 
both  using  acoustics  and  vision.  The  AUV  SeaBED  [14]  is  primarily  an  optical 
imaging  AUV,  used  in  a  diverse  range  of  oceanographic  cruises  including  coral  reef 
characterization  [13]  and  surveys  of  ground  fish  populations  [1].  Recently,  the  re¬ 
lated  AUVs  Puma  and  Jaguar  searched  for  hydrothermal  vents  under  the  artic  ice 
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[7].  Other  AUV  systems  have  been  used  to  explore  biophysical  coupling,  including 
mapping  harmful  algal  blooms  [11]  and  characterising  up- welling  around  canyons 
[12] 

The  University  of  Sydney’s  Australian  Centre  for  Field  Robotics  operates  an 
ocean-going  AUV  called  Sirius  capable  of  undertaking  high  resolution,  geo-referenced 
survey  work  [15].  This  platform  is  a  modified  version  of  the  WHOI  SeaBED  vehicle. 
This  class  of  AUV  has  been  designed  specifically  for  relatively  low  speed,  high  reso¬ 
lution  imaging  and  is  passively  stable  in  pitch  and  roll.  The  submersible  is  equipped 
with  a  full  suite  of  oceanographic  sensors  including  a  high  resolution  stereo  cam¬ 
era  pair  and  strobes,  multibeam  sonar,  a  depth  sensor,  Doppler  Velocity  Log  (DVL) 
including  a  compass  with  integrated  roll  and  pitch  sensors,  Ultra  Short  Baseline 
Acoustic  Positioning  System  (USBL),  forward-looking  obstacle  avoidance  sonar,  a 
conductivity/temperature  sensor  and  combination  uorometer/scattering  sensor  to 
measure  chlorophyll-a,  turbidity  and  dissolved  organic  matter.  The  on-board  com¬ 
puter  logs  sensor  information  and  runs  the  vehicle’s  low-level  control  algorithms. 
Sirius  is  part  of  the  Integrated  Marine  Observing  System  (IMOS)  AUV  Facility, 
with  funding  available  on  a  competitive  basis  to  support  its  deployment  as  part  of 
marine  studies  in  Australia. 

Navigation  underwater  is  challenging  because  electromagnetic  signals  attenu¬ 
ate  strongly  with  distance.  Absolute  position  estimates  such  as  those  provided  by 
GPS  are  therefore  not  readily  available.  Simultaneous  Localisation  and  Mapping 
(SLAM)  is  the  process  of  concurrently  building  a  feature  based  map  of  the  en¬ 
vironment  and  using  this  map  to  obtain  estimates  of  the  location  of  the  vehicle. 
The  SLAM  algorithm  has  seen  a  considerable  amount  of  interest  from  the  mobile 
robotics  community  as  a  tool  to  enable  fully  autonomous  navigation  [3,  4].  Our 
current  work  has  concentrated  on  efficient,  stereo  based  Simultaneous  Localisation 
and  Mapping  and  dense  scene  reconstruction  suitable  for  creating  detailed  maps  of 
sea  oor  survey  sites  [8,  9].  These  novel  approaches,  based  on  Visual  Augmented 
Navigation  (VAN)  techniques  [5],  enable  the  complexity  of  recovering  the  state  es¬ 
timate  and  covariance  matrix  in  a  VAN  framework  to  be  managed.  This  has  allowed 
these  algorithms  to  run  on  significantly  larger  mapping  problems  than  was  previ¬ 
ously  feasible.  These  techniques  have  been  used  to  renavigate  the  estimated  vehicle 
trajectories  using  the  data  collected  for  this  paper. 

A  typical  dive  will  yield  several  thousand  geo-referenced  overlapping  stereo 
pairs.  While  useful  in  themselves,  single  images  make  it  difficult  to  appreciate  spa¬ 
tial  features  and  patterns  at  larger  scales.  It  is  possible  to  combine  the  SLAM  tra¬ 
jectory  estimates  with  the  stereo  image  pairs  to  generate  3D  meshes  and  place  them 
in  a  common  reference  frame  [15].  The  resulting  composite  mesh  allows  a  user  to 
quickly  and  easily  interact  with  the  data  while  choosing  the  scale  and  viewpoint 
suitable  for  the  investigation.  Spatial  relationships  within  the  data  are  preserved  and 
scientists  can  move  from  a  high  level  view  of  the  environment  down  to  very  detailed 
investigation  of  individual  images  and  features  of  interest  within  them.  This  is  a 
useful  tool  for  the  end  user  to  develop  an  intuition  of  the  scales  and  distributions  of 
spatial  patterns,  even  before  any  automated  interpretation  is  attempted.  Examples  of 
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the  output  of  the  3D  reconstructions  for  dives  undertaken  on  this  cruise  are  included 
below. 


4  Deployments 

The  deployments  undertaken  over  the  course  of  the  10  day  cruise  in  October  2008 
were  on  shelf  reef  habitats  at  depths  of  between  50  and  100  m  in  eastern  Tasmanian 
waters  and  in  estuarine  waters  in  30  to  50  m  deep  in  the  Huon  MPA  and  around 
Port  Arthur,  Tasmania.  The  vehicle  was  deployed  on  19  dives  over  the  10  days  of 
operation.  During  the  course  of  these  dives,  the  vehicle  covered  in  excess  of  70  lin¬ 
ear  kilometers  of  survey  and  collected  nearly  160,000  high  resolution  stereo  image 
pairs.  Each  dive  ranged  between  2.5  km  and  6.5  km  in  total  length  with  an  average 
of  3.7  km  covered  per  dive  travelling  at  a  speed  of  0.5  m/s  or  approximately  1  knot. 
Table  1  shows  summary  statistics  for  the  dives  undertaken  during  the  course  of  this 
cruise.  Dives  1  through  4  were  calibration  runs  undertaken  prior  to  the  scientific 
missions  and  are  not  show  here. 

Figure  2  shows  the  AUV  dive  profiles  overlaid  on  the  previously  collected  ship- 
borne  mutlibeam  bathymetry,  focusing  on  the  Tasman  Peninsula  deployments.  The 
AUV  dive  profiles  were  targeting  particular  rocky  reef  structures  identified  in  the 
bathymetry  derived  from  ship-borne  multibeam  surveys  undertaken  prior  to  the 


Table  1  AUV  Tasmania  Dive  Summary 

Dive  Lat  Long  Max  Depth  Distance  Avg.  Alt.  No.  Stereo  pairs 


5  -43.0615 

147.9648 

[m] 

67.0 

[m] 

5625 

[m] 

2.00 

12262 

6  -43.0631 

147.9825 

66.3 

2947 

2.00 

7256 

7  -43.084558 

147.974086 

77.9 

4774 

2.00 

11278 

8  -43.094119 

148.024267 

85.9 

2817 

2.10 

7262 

9  -43.120019 

148.05373 

90.9 

2652 

2.08 

6336 

10  -43.119798 

148.047008 

88.5 

2717 

2.06 

6406 

11  -43.119126 

148.038401 

83.3 

2660 

1.97 

6727 

12  -43.120581 

148.03754 

84.0 

2875 

2.00 

6787 

13  -43.123972 

148.053974 

96.3 

2759 

2.16 

6870 

14  -43.119887 

148.045257 

89.5 

2995 

2.06 

7737 

15  -43.040462 

147.955845 

57.8 

4559 

1.99 

10563 

16  -42.956996 

148.005154 

76.4 

6542 

2.05 

15521 

17  -43.165023 

147.874164 

52.7 

4290 

1.99 

9768 

18  -43.18674 

147.88763 

32.4 

3889 

2.02 

8863 

19  -42.913151 

148.003898 

60.0 

6300 

2.00 

15162 

20  -43.084558 

147.974086 

76.9 

2780 

2.04 

6564 

21  -43.29307 

147.129389 

45.9 

4619 

2.00 

9655 

22  -43.327701 

147.166552 

35.8 

1153 

1.51 

2548 

23  -43.270816 

147.124965 

33.0 

3200 

1.50 

1623 

TOTAL 

70153 

159188 
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(a)  Tasman  peninsula  deployments 


(b)  Fortescue  Bay 


Fig.  2  Tasmanian  Deployments  2008  illustrating  the  prior  multibeam  bathymetry  available  with 
AUV  dive  profiles  overlaid.  The  AUV  profiles  are  colour  coded  by  depth. 


cruise.  Figure  3  (a)  shows  two  AUV  dive  profiles  over  the  bathymetry  at  OHara  reef 
(seen  in  the  middle  of  Figure  2  (b))  with  high  resolution  multibeam  data  collected 
by  the  vehicle  from  an  altitude  of  20  m  over  the  eastern  edge  of  the  reef  embedded 
in  the  figure.  Figure  3  (b)  shows  details  of  one  of  the  three  dimensional  sea  oor 
reconstructions  generated  using  the  combined  SLAM  and  stereo  meshes  for  one  of 
these  dives. 
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Figure  4  shows  dive  profiles  for  six  dives  undertaken  around  the  Hippolytes,  a 
rocky  island  located  group  approximately  4  km  offshore.  These  areas,  in  approxi¬ 
mately  90  m  of  water,  were  expected  to  feature  high  levels  of  biodiversity.  Finally, 
Figure  5  shows  high  resolution  multibeam  bathymetry  collected  by  the  vehicle  in 
the  Huon  MPA  showing  a  series  of  pockmarks  mapped  using  the  vehicle’s  on-board 
multibeam  sonar  at  an  altitude  of  20  m  during  dive  23.  These  pockmarks  had  been 
identified  in  prior  ship  borne  multibeam  surveying  but  were  targeted  by  the  AUV 
to  generate  higher  resolution  bathymetry  as  well  as  undertaking  a  number  of  vi¬ 
sual  passes  at  an  imaging  altitude  of  2  m.  The  visual  imagery  revealed  that  these 
pockmarks,  surrounded  by  muddy  substrate,  were  filled  with  algal  growth.  Further 
investigation  using  divers  is  planned  to  determine  the  nature  of  this  growth. 

The  data  gathered  during  the  AUV  dives  will  form  a  baseline  at  or  near  the  time 
of  protection  for  newly  designated  MPAs.  This  will  allow  changes  related  to  pro¬ 
tection,  time  and  climate  change  to  be  assessed  through  time.  This  dataset  will  un¬ 
derpin  proposals  for  ongoing  research  on  the  respective  MPAs  (Commonwealth  and 
State)  with  respect  to  both  ecosystem  effects  of  fishing  and  climate  change,  in  an 
otherwise  poorly  understood  ecostystem.  The  AUV,  ROV  and  BRUV  surveys  will 
also  provide,  for  the  first  time,  a  detailed  quantitative  inventory  of  the  benthic  fau¬ 
nal  assemblages  associated  with  temperate  shelf  reef  systems  in  this  region,  and  of 
the  MPAs  themselves.  Shelf  reef  assemblages  of  this  region  remain  relatively  un¬ 
described  with  respect  to  the  quantitative  composition  of  dominant  species  or  tax¬ 
onomic  groupings,  therefore  providing  that  initial  baseline  description  represents  a 
significant  contribution  of  this  study.  Like  inshore  systems,  shelf  reef  assemblages  in 
this  region  are  in  uenced  by  fishing  (e.g.  scalefish  and  lobster),  introduced  species 
and  climate  change,  yet  nothing  is  currently  known  of  the  interactions  occurring  in 
these  systems. 


5  Conclusions  and  Future  Work 


This  paper  has  reported  on  an  expedition  taken  to  survey  biological  assemblages 
associated  with  deep  water,  rocky  reef  systems  off  the  coast  of  the  Tasman  peninsula 
and  in  the  Huon  MPA.  Although  the  detailed  analysis  of  the  data  and  comparison 
against  the  benthic  habitat  classification  based  on  sonar  backscatter  and  slope  is  on¬ 
going,  we  have  illustrated  how  the  data  collected  by  the  AUV  is  complementary  to 
ship-borne  multibeam. 

We  are  currently  preparing  to  return  to  Tasmania  in  early  2009  to  undertake  addi¬ 
tional  deployments  associated  with  further  multibeam  swath  mapping  and  to  target 
urchin  barrens  and  scallop  fisheries.  The  impact  of  fishing  and  climate  change  on 
these  poorly  understood  habitats  stands  to  derive  significant  benefit  from  the  de¬ 
tailed,  optical  surveying  capabilities  of  tools  such  as  the  AUV. 
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(a)  Ohara  reef  bathymetry 


(b)  3D  reconstruction 


Fig.  3  (a)  Ohara  reef  bathymetry  with  two  AUV  imaging  dive  profiles  (Dive  7  and  20)  overlaid. 
The  long  axis  of  these  dives  are  2  km  in  length.  The  AUV  dive  profiles  are  colour  coded  by  mea¬ 
sured  depth.  The  colour  coding  between  the  AUV  depth  and  the  bathymetry  are  not  consistent  in 
order  for  the  dive  profile  to  stand  out.  Detailed  bathymetry  collected  by  the  vehicle  from  a  20m 
altitude  at  the  eastern  edge  of  the  reef  is  overlaid  on  the  orginal  ship-borne  bathymetry  showing 
the  interface  between  the  rocky  reef  and  the  deeper,  sandy  substrate.  The  AUV  bathymetric  data  is 
gridded  at  0.5m  resolution,  (b)  Details  of  one  of  the  SLAM  loop  closure  points  identified  during 
Dive  7  at  OHara  reef.  The  vehicle  path  generated  using  SLAM  has  been  used  to  place  the  stereo 
meshes  into  space  and  the  resulting  mesh  has  been  texture  mapped  using  the  images.  As  can  be 
seen,  the  loop  closure  has  been  successful,  resolving  the  detailed  structure  of  the  scene  in  spite  of 
the  vehicle  having  travelled  nearly  2  km  between  the  two  passes  over  this  area. 
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Fig.  4  Dive  profiles  for  dives  8  through  13  around  the  Hippolytes,  a  series  of  islands  located  some 
4  km  offshore  of  the  Tasman  peninsula. 


Fig.  5  A  series  of  pockmarks  mapped  using  the  vehicle’s  on-board  multibeam  sonar  at  an  altitude 
of  20  m  during  dive  23  in  the  Huon  MPA. 
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Abstract  The  operation  of  Autonomous  Underwater  Vehicles  (AUVs)  within  under¬ 
water  sensor  network  fields  provides  an  opportunity  to  reuse  the  network  infrastruc¬ 
ture  for  long  baseline  localisation  of  the  AUV.  Computationally  efficient  localisation 
can  be  accomplished  using  off-the-shelf  hardware  that  is  comparatively  inexpensive 
and  which  could  already  be  deployed  in  the  environment  for  monitoring  purposes. 
This  paper  describes  the  development  of  a  particle  filter  based  localisation  system 
which  is  implemented  onboard  an  AUV  in  real-time  using  ranging  information  ob¬ 
tained  from  an  ad-hoc  underwater  sensor  network.  An  experimental  demonstration 
of  this  approach  was  conducted  in  a  lake  with  results  presented  illustrating  network 
communication  and  localisation  performance. 


1  Introduction 

Tracking  an  Autonomous  Underwater  Vehicle’s  (AUV’s)  position  is  essential  for 
navigation  and  geo-referencing  data  gathered  during  survey  tasks.  Unlike  some 
other  types  of  field  robots,  AUVs  suffer  significant  challenges  in  determining  their 
location  as  GPS  signals  are  only  available  whilst  at  the  surface.  There  are  three 
general  approaches  to  maintaining  an  AUV’s  position  estimate  for  navigation:  (1) 
integration  of  motion  estimates,  (2)  using  acoustic  transponders  as  a  position  refer¬ 
ence,  or  (3)  using  the  environment  as  a  position  reference  [10, 12]. 

Path  integration  approaches  accumulate  motion  estimates  from  either  inertial 
measurement  units,  Doppler  Velocity  Logs  (DVLs)  or  vision  systems  to  determine 
position  relative  to  a  starting  point  [14].  These  by  themselves  suffer  integration  drift 
with  time  and  minimising  this  drift  is  highly  dependent  on  hardware  selection  and 
algorithm  choice. 
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Acoustic  transponder  systems  provide  absolute  position  estimates  and  do  not  suf¬ 
fer  from  error  accumulation  inherent  in  path  integration  approaches.  These  can  ei¬ 
ther  be  characterised  as  long  or  short  baseline  systems.  In  long  baseline  systems, 
the  AUV  uses  ranging  information  from  acoustic  transponders  placed  at  known  lo¬ 
cations  to  determine  its  position.  Short  baseline  systems  typically  localise  the  AUV 
relative  to  a  support  vessel.  Although  successfully  used  in  many  open  ocean  and 
near  shore  situations,  key  issues  in  this  method  arise  from  the  propagation  of  sound 
underwater  such  as  reflections,  echoes,  and  changes  in  water  density  [12, 14]. 

An  alternative  to  both  odometry  and  acoustic  transponders  is  to  reference  the  ve¬ 
hicle’s  position  against  environmental  features  such  as  the  seabed  terrain  or  salient 
image  features.  This  approach  is  advantageous  as  no  other  objects  need  to  be  in¬ 
troduced  into  the  environment  thus  reducing  equipment  cost,  deployment  time  and 
survey  requirements.  Many  such  approaches  use  Simultaneous  Localisation  And 
Mapping  (SLAM)  to  produce  an  environment  map  for  localisation  which  has  the 
advantage  of  allowing  exploration  without  a  map  and  the  side  benefit  of  construct¬ 
ing  a  map  of  the  environment  [9].  These  methods  are  often  used  in  conjunction  with 
an  odometry  system  [8]  for  improved  localisation  in  low  feature  environments  and 
during  decent  to  and  ascent  from  the  seafloor. 

In  this  paper,  we  propose  an  intermediate  approach  to  AUV  localisation  by  util¬ 
ising  existing  acoustic  sensor  network  nodes  (underwater  modems)  designed  for  ad- 
hoc  data  communication  [2]  instead  of  deploying  a  dedicated  long  or  short  baseline 
localisation  system.  Furthermore,  this  paper  describes  the  development  of  a  network 
based  time-of-flight  localisation  algorithm  which  is  implemented  onboard  the  AUV 
in  real-time  to  allow  navigation  within  challenging  and  unstructured  environments. 


2  Sensor  Network  Based  Localisation 

Underwater  wireless  sensor  networks  provide  a  means  to  remotely  monitor  a  set  of 
parameters  at  multiple  locations  distributed  throughout  a  marine  environment.  Many 
challenges  exist  to  ensure  reliable  communications  which  include  environmental  ef¬ 
fects  as  well  as  hardware  limitations  [1].  As  such,  most  reported  sensor  nodes  use 
either  point-to-point  communications  between  two  nodes,  or  preset  routing  tables  in 
multi-node  systems.  Previous  work  realised  ad-hoc  underwater  communications  by 
using  commercially  available  acoustic  modems  that  were  controlled  by  low  power 
processor  boards  attached  to  each  modem  to  act  as  the  sensing  platform  and  imple¬ 
ment  a  networking  layer  [2] .  This  forms  the  foundation  for  the  network  used  in  this 
investigation. 


2.1  Scenario  Description 

This  paper  considers  the  use  of  acoustic  sensor  network  nodes  already  deployed 
in  the  environment  as  a  means  of  providing  localisation  information  to  an  AUV  in 
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place  of  dedicated  long  baseline  transponders.  An  AUV  is  equipped  with  an  acoustic 
modem  allowing  it  to  communicate  with  the  surrounding  sensor  nodes  and  from  the 
round  trip  time  determine  its  distance  to  the  node.  In  a  typical  scenario,  an  AUV 
would  operate  in  and  around  a  sensor  network,  acting  as  an  additional  mobile  sensor 
in  a  long  term  infrastructure  or  environmental  monitoring  project.  In  such  a  situation 
network  relative  localisation  is  necessary  for  both  navigation  and  registering  sensor 
data. 

The  system  proposed  here  is  based  on  the  assumption  that  sensor  nodes  are 
placed  arbitrarily  within  the  environment,  however,  their  position  and  depth  below 
the  surface  is  known  to  the  AUV  with  a  bounded  uncertainty.  It  is  also  assumed 
that  not  all  nodes  are  within  communication  range  of  the  AUV  and  communication 
reliability  between  the  AUV  and  a  sensor  node  is  not  perfect.  This  is  plausible  in 
practice  when  the  sensor  network  extends  over  great  areas  such  as  in  lakes,  rivers, 
or  around  coral  reefs  where  line-of- sight  communication  is  not  available. 

2.2  Localisation 

In  previous  work  [3,  4],  a  localisation  system  was  developed  and  demonstrated 
which  used  custom  built  acoustic  transponders  utilising  TDM  A.  This  system  al¬ 
lowed  accurate  localisation  of  the  AUV  and  self-localisation  of  the  transponders 
within  the  environment.  The  solution  proposed  here  differs  in  that  there  is  no  clock 
synchronisation  between  the  AUV  and  the  sensor  nodes,  rather  round  trip  times  are 
measured.  This  also  means  that  nodes  are  not  consuming  energy  via  transmission 
just  to  maintain  network  synchronisation  when  the  AUV  is  not  within  range. 

As  the  elapsed  round  trip  time  is  being  measured  at  the  message  (communication 
packet)  level,  it  is  influenced  by  factors  such  as  the  delay  for  modulating  and  demod¬ 
ulating  the  signals,  packet  length  and  other  inherent  hardware  delays.  Therefore,  the 
estimate  of  range,  R ,  can  be  determined  by  the  measurement  of  the  time-of-flight, 
ttof ,  and  systematic  hardware  and  software  delays  by 


(1) 


where  dA/N  and  dN/A  are  the  actual  distances  from  the  AUV  to  the  Node  and  from 
the  Node  to  the  AUV  respectively,  AtModem/Packet  is  the  internal  time  delay  of  the 
modem  for  packet  handling  which  is  assumed  deterministic,  Atnode  is  the  time  over¬ 
head  of  the  node  controller  again  assumed  deterministic,  and  vw  and  vw  is  the  actual 
and  approximated  velocity  of  sound  in  water. 

The  system  fuses  time-of-flight  data  from  the  beacons  (sensor  nodes)  with  self 
motion  and  depth  measurements  made  by  the  vehicle  using  a  particle  filter  [13]. 
A  Kalman  filter  could  also  achieve  this,  however,  using  a  particle  filter  allows  the 
possibility  of  multi-modal  distributions  which  would  be  necessary  in  global  locali¬ 
sation.  The  vehicle  pose  is  represented  as 
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X  =  {x,y,z}r 


(2) 


where  v  and  y  are  the  vehicle’s  Cartesian  coordinates  referenced  to  a  global  frame, 
and  z  is  the  vehicle’s  depth.  In  practice  the  vehicle’s  depth  sensor  is  sufficiently 
precise  to  not  need  filtering,  however,  a  3D  filter  is  used  to  allow  for  potential  cali¬ 
bration  errors  in  the  depth  sensor  and  for  differences  between  the  depth  zero  point 
between  the  AUV  and  the  beacons.  In  this  study,  an  observational  model  on  depth 
is  applied  as  a  normal  distribution  with  a  =  0.1m. 

The  range  estimation  has  two  error  conditions;  (1)  normally  distributed  mea¬ 
surement  noise,  and  (2)  echoes.  Measurement  noise  is  modelled  as  being  normally 
distributed  with  a  a  =  3m  (See  Section  4).  The  range  measurements  are  modelled 
as  having  a  95%  chance  of  being  a  true  return  as  opposed  to  an  echo  (P(E)  =  0.05), 
which  removes  the  need  for  explicitly  filtering  outliers  before  updating  the  filter. 
Algebraically,  the  probability  of  a  pose  estimate,  X,  given  a  range  estimate,  R ,  from 
modem  i,  is  given  by 


(3) 


where  rmax  is  the  maximum  permissible  range,  N(x,  o)  is  a  zero  mean  Gaussian  and 
each  of  the  modem  locations,  m,  is  defined  as 


m,  =  {xmi  ,  ymi  ,  depthmi}T 


(4) 


In  this  investigation,  1000  particles  are  used  which  can  easily  be  processed  in 
real-time  on  the  AUV’s  onboard  computer.  Resampling  occurs  whenever  range  or 
depth  measurements  are  made. 

As  the  proposed  AUV  does  not  have  Doppler  or  visual  odometry  (assumed  turbid 
water  in  this  investigation)  the  motion  model  for  the  vehicle  is  based  on  motor  force 
and  vehicle  hydrodynamics.  The  dominant  source  of  error  in  the  motion  model  is 
assumed  to  be  an  unknown  current  acting  on  the  vehicle.  The  unfortunate  situation 
for  AUVs  without  Doppler  Velocity  Logs  or  precision  inertial  measurement  units  is 
that  there  is  no  way  to  account  for  external  disturbances.  This  places  the  burden  on 
the  localiser  to  detect  and  correct  disturbances.  This  current  is  assumed  to  have  a 
zero  mean,  normally  distributed  velocity  with  0.5  to  1.2  m/s  standard  deviation. 


3  Experimental  Platforms 

The  acoustic  sensor  network  nodes  and  CSIRO  developed  Starbug  AUV  used  in  this 
investigation  are  described  below. 
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(a)  Internal  view  of  an  acoustic  underwater  sensor  (b)  The  Starbug  Mklll  AUV. 

network  node. 


Fig.  1  Acoustic  underwater  sensor  network  node  and  the  Starbug  AUV  used  in  network  based 
localisation  trials. 


Acoustic  Sensor  Network  Nodes 

The  sensor  network  nodes  used  in  this  investigation  were  developed  for  ad-hoc  un¬ 
derwater  communications  and  consist  of  an  acoustic  modem,  a  custom  developed 
“node-controller”,  power  supply  and  environment  sensors  as  shown  in  Figure  1(a). 

The  modem  used  was  a  commercially  available  CDMA  modem  1 ,  chosen  for  its 
relatively  low  cost,  in-built  broadcast  capabilities,  and  RS232  interface,  although 
its  maximum  transmission  rate  is  only  480bps.  It  should  be  noted  that  any  acoustic 
modem  that  permits  broadcast  messaging  at  a  sufficient  bit  rate  could  be  used  with 
only  slight  changes  to  the  software  that  runs  on  the  node  controller. 

The  “node  controller”  is  a  custom  system  designed  to  provide  the  networking 
layer  that  most  commercial  modems  lack  [2].  The  ad-hoc  networking  protocol  is 
based  on  a  modified  Dynamic  Source  Routing  (DSR)  approach  and  can  be  con¬ 
figured  for  maximizing  information  throughput  or  minimising  energy  expenditure. 
The  node  controller  software  is  implemented  on  CSIRO’s  “Fleck”  wireless  sensor 
network  embedded  hardware  [11].  Apart  from  the  need  to  model  message  handling 
delays,  the  localisation  algorithm  is  not  strongly  dependendent  upon  the  details  of 
the  node  controller.  Range  measurement  messages  are  short  (4  bytes  of  data)  con¬ 
sisting  of  source  and  destination  addresses,  message  type  and  a  temporary  message 
identifier. 

Autonomous  Underwater  Vehicle 

Figure  1(b)  shows  the  Starbug  AUV  that  was  used  in  these  experimental  trials  [6], 
which  is  a  shallow  water  research  vehicle  with  an  operating  speed  of  0.5  to  1.0  m/s 
and  a  maximum  depth  of  40  m.  Unlike  many  other  AUVs,  Starbug  uses  only  visual 


1  Aquacomm  acoustic  modem  from  DSPComm  (www.dspcomm.com) 
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odometry  provided  by  a  downward  facing  pair  of  stereo  cameras  which  also  record 
video  data  [7].  While  this  approach  provides  accurate  odometry  information  it  is 
only  available  when  the  AUV’s  altitude  is  low  and  degrades  when  operating  in  low- 
light  or  turbid  water,  or  when  descending  through  the  water  column.  This  capability 
is  not  used  in  these  experiments  so  that  the  worst  case  localisation  is  presented. 

One  of  the  sensor  nodes  described  above  is  mounted  on  the  AUV  allowing  it 
to  communicate  with  underwater  nodes  that  are  within  range.  The  AUV  interfaces 
with  the  onboard  sensor  node  via  a  serial  connection  allowing  it  to  interrogate  sur¬ 
rounding  nodes  in  turn  and  estimate  time-of-flight.  In  general  operation,  one  range 
measurement  is  made  every  two  seconds. 


4  Experimental  Results 

The  proposed  sensor  network  based  localisation  system  described  in  Section  2  was 
tested  using  an  underwater  network  at  Lake  Wivenhoe  in  Queensland,  Australia.  Ex¬ 
periments  included  characterisation  of  the  modems  with  node-controllers  for  inclu¬ 
sion  in  the  particle  filter,  as  well  as  evaluation  of  the  AUV  localisation  performance 
with  varying  number  of  sensor  nodes  and  geometric  distribution. 

4.1  Acoustic  Modem  Characterisation 

The  first  experiments  consisted  of  assessing  the  communication  throughput  for  a 
static  network  in  which  data  packets  are  transmitted  from  a  source  to  sink  node  in  a 
multi-hop  fashion.  Figure  2(a)  shows  that  packet  throughput  with  increasing  range 
illustrating  that  under  static  conditions  data  throughput  between  the  deployed  sensor 
nodes  deteriorates  significantly  after  approximately  1200m. 

The  modems  used  have  an  onboard  active  receive  gain  adjustment  not  accessible 
to  our  node-controller.  During  initial  trials  it  was  observed  that  as  the  AUV  moved 
around  the  environment  the  range  request  packets  from  the  AUV  would  not  be  reli¬ 
ably  received.  As  such,  each  modem  was  sent  two  range  requests  in  succession  (two 
seconds  apart)  to  allow  the  modem  to  adjust  its  receive  gain  if  necessary.  Figure  2(b) 
shows  the  results  of  successful  range  request  returns  as  a  function  of  range  as  the 
AUV  moves  throughout  the  sensor  fields.  As  seen,  on  average,  a  greater  number  of 
second  requests  were  successful.  This  assisted  in  receiving  ranges  to  nodes,  how¬ 
ever,  it  slowed  the  rate  at  which  all  nodes  in  the  sensor  field  could  be  interrogated. 

Figure  3(a)  shows  the  range  to  a  sensor  node  from  the  AUV  as  it  moves  through¬ 
out  the  sensor  field  from  time-of-flight  measurements  against  the  range  as  calculated 
from  GPS  (AUV  antenna  was  at  the  surface).  As  seen,  a  general  linear  relationship 
exists,  however,  a  number  of  false  readings  are  present  as  a  result  of  echoes  within 
the  environment  creating  longer  return  path  lengths  (and  hence  time-of-flight). 

Figure  3(b)  shows  the  range  estimation  error  for  the  data  in  Figure  3(a)  after 
discarding  range  measurements  that  can  be  attributed  to  echoes.  The  remaining 
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(a)  Measured  modem  packet  loss  statistics  with  a  (b)  Rate  of  success  from  first  and  second 
calculated  limit  curve  based  on  cylindrical  spread-  range  requests, 
ing  and  a  Gaussian  channel. 


Fig.  2  Static  network  node  performance  for  normal  intemode  communication  and  the  success  rate 
of  ranging  requests  from  the  AUV. 


Error  (m) 


(a)  Effect  of  echoes  within  environment  (b)  Normalised  distribution  of  range  error 
and  false  range  estimates.  after  the  removal  of  multipath  effects. 


Fig.  3  Measured  modem  ranging  performance  with  echoes  and  range  estimate  performance. 


range  errors  are  approximately  normally  distributed  with  a  standard  deviation  of 
approximately  3  m.  For  this  environment  a  worst  case  estimate  of  the  probability  of 
multi-pathing  is  5%.  Both  of  these  effects  are  used  in  the  sensor  model  described  in 
Section  2. 

4.2  Localisation  Performance 

The  localisation  performance  was  measured  by  sending  the  AUV  on  a  trajectory 
consisting  of  a  set  of  waypoints  such  that  the  GPS  antenna  was  at  the  water’s  surface 
for  ground  truth.  The  experiments  consisted  of  a  different  number  of  sensor  nodes 
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(b)  Localisation  error  as  a  function  of 
time. 


Fig.  4  (a)  The  actual  and  estimated  path  using  the  localiser  with  three  sensor  nodes  within  com¬ 
munication  range  of  the  AUV.  Also  shown  are  the  locations  of  the  sensor  nodes  (diamonds)  and 
the  covariance  of  the  localisation  estimate.  A  total  of  397  range  updates  (out  of  approximately  600 
attempted)  were  received  during  the  experiment,  (b)  The  error  in  position  between  ground  truth 
and  localiser  as  a  function  of  time  with  a  bounded  RMS  value  of  4.4m. 


within  communication  range  and  in  different  locations  of  Lake  Wivenhoe  with  water 
depth  ranging  from  6  to  28  m.  The  vehicle  speed  was  set  to  0.5  m/s  and  the  particle 
filter  was  initialised  to  the  GPS  position  at  the  start  of  each  experiment. 

Figure  4  shows  the  results  of  a  trial  with  three  sensor  nodes  within  communica¬ 
tion  range  of  the  AUV.  The  error  in  position  between  ground  truth  and  localiser  is 
bounded  with  an  RMS  value  of  4.4m.  Figure  5  shows  a  second  trial  with  four  sensor 
nodes  within  communication  range  of  the  AUV  during  a  rectangular  survey  mission. 
As  seen,  the  localisation  error  is  bounded  with  an  RMS  value  of  3.  lm  illustrating  an 
improved  localisation  estimate  with  a  different  node  arrangement. 

The  performance  of  the  system  was  evaluated  in  the  limiting  case  where  only  one 
sensor  node  is  within  communication  range  (single  beacon  localisation).  Using  the 
data  acquired  from  the  three  sensor  node  case,  Figure  6  shows  the  performance  of 
the  system  using  only  one  of  the  sensor  nodes.  Compared  to  the  first  scenario  the 
number  of  range  measurements  has  been  reduced  from  397  to  168.  The  RMS  local¬ 
isation  error  is  7.7  m,  although  greater  than  previous  experiments  it  does  illustrate 
that  the  AUV  is  still  capable  of  estimating  its  position  with  an  accuracy  suitable  for 
navigating  in  most  underwater  environments. 


5  Conclusions 

An  approach  to  localisation  of  an  AUV  using  an  ad-hoc  acoustic  underwater  net¬ 
work  has  been  presented.  The  system  utilises  existing  sensor  network  infrastructure 
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time. 


Fig.  5  Localisation  performance  of  particle  filter  over  the  course  of  a  rectangular  survey  exper¬ 
iment  with  four  sensor  nodes  within  communication  range.  A  total  of  154  range  updates  were 
received  (out  of  approximately  285  attempted)  during  the  experiment  and  the  RMS  error  was  3.1 
m. 


(a)  Estimated  and  actual  trajectories.  (b)  Localisation  error  as  a  function  of 

time. 


Fig.  6  Localisation  performance  of  particle  filter  over  the  course  of  a  survey  experiment  with  one 
sensor  node  within  communication  range. 


with  estimates  of  communication  overhead  to  determine  time-of-flight  information 
between  the  AUV  and  individual  sensor  nodes.  A  particle  filter  based  localisation 
system  was  developed  and  implemented  in  real-time  onboard  the  AUV.  Experimen¬ 
tal  field  results  have  demonstrated  that,  despite  the  absence  of  precise  self  motion 
estimate  and  a  slow  rate  of  ranging  updates,  the  system  is  able  to  provide  localisation 
performance  sufficiently  accurate  for  navigation  within  and  immediately  outside  the 
sensor  network  communication  field. 
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Current  research  is  focused  on  improving  localiser  performance  by  incorporating 
more  accurate  vehicle  motion  estimates  using  the  AUV’s  existing  visual  odometry 
system  [7].  Additionally,  the  system  is  being  expanded  to  perform  network  self  lo¬ 
calisation.  This  could  be  accomplished  using  network  localisation  techniques  such 
as  in  Djusgash,  Singh,  et  al.  2008  [5].  In  the  scenario  of  long  term  underwater  mon¬ 
itoring  this  would  allow  extra  sensor  nodes  to  be  added  to  the  network  without  re¬ 
quiring  precise  knowledge  of  the  sensors  location.  Finally,  the  integrated  AUV  and 
sensor  network  system  performance  is  currently  being  evaluated  in  different  ocean 
environments  including  bays  and  coral  reefs. 
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Experiments  in  Visual  Localisation  around 
Underwater  Structures 


Stephen  Nuske,  Jonathan  Roberts,  David  Prasser  and  Gordon  Wyeth 


Abstract  Localisation  of  an  AUV  is  challenging  and  a  range  of  inspection  applica¬ 
tions  require  relatively  accurate  positioning  information  with  respect  to  submerged 
structures.  We  have  developed  a  vision  based  localisation  method  that  uses  a  3D 
model  of  the  structure  to  be  inspected.  The  system  comprises  a  monocular  vision 
system,  a  spotlight  and  a  low-cost  IMU.  Previous  methods  that  attempt  to  solve  the 
problem  in  a  similar  way  try  and  factor  out  the  effects  of  lighting.  Effects,  such  as 
shading  on  curved  surfaces  or  specular  reflections,  are  heavily  dependent  on  the  light 
direction  and  are  difficult  to  deal  with  when  using  existing  techniques.  The  novelty 
of  our  method  is  that  we  explicitly  model  the  light  source.  Results  are  shown  of  an 
implementation  on  a  small  AUV  in  clear  water  at  night. 


1  Introduction 

We  are  interested  in  the  localisation  of  underwater  robots  around  fixed  infrastruc¬ 
ture.  There  are  many  applications  of  underwater  robotics  where  it  is  critical  for  the 
robot  to  know  where  it  is  with  respect  to  a  structure,  such  as  inspection  tasks  and 
welding.  Assuming  that  most  structures  are  passive,  ie.  they  do  not  transmit  any  lo¬ 
cation  information,  then  there  are  two  viable  sensing  modalities  that  can  be  used  to 
image  a  structure;  sonar  and  computer  vision.  Of  these,  we  have  been  investigating 
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the  use  of  vision  in  order  to  localise  an  Autonomous  Underwater  Vehicle  (AUV) 
with  respect  to  a  known  piece  of  underwater  infrastructure  -  the  leg  of  a  surface 
platform. 

Typically,  the  visual  environment  around  such  a  structure  is  poor.  Firstly,  sus¬ 
pended  particles  in  the  water  reduce  visibility.  Secondly,  there  is  minimal  or  no  nat¬ 
ural  lighting  deep  underwater,  thus  requiring  an  artificial  light  source  to  be  mounted 
on  the  AUV.  Thirdly,  the  visual  appearance  of  the  structure  in  this  scenario  is  highly 
dependent  on  the  incident  angle  of  the  light  source.  The  light  source  is  constantly 
moving  (as  it  is  on  the  AUV)  and  consequently  the  visual  appearance  of  the  structure 
varies  dramatically  over  time.  This  is  quite  different  from  typical  well  lit  environ¬ 
ments,  where  the  light  source  (typically  the  Sun)  is  far  less  dynamic  and  also  where 
there  is  a  significant  level  of  ambient  lighting.  However,  rather  than  this  poor  visual 
environment  being  a  negative,  we  would  argue  that  we  can  turn  it  to  our  advantage. 
By  modeling  the  light  source  mounted  on  the  AUV  we  can  predict  the  appearance 
of  the  structure  (the  legs  of  a  platform  in  our  example)  from  different  viewing  poses. 
The  process  is  to  use  an  a  priori  3D-surface  model  of  the  permanent  structure  be¬ 
ing  navigated  with  the  light  model  to  generate  artificial  images  which  are  compared 
against  the  real  camera  image  to  localise  the  AUV. 


2  Previous  Work 

Kondo  et  al.  present  two  methods  of  navigating  underwater  structures  in  [10]  and 
[9].  In  [10],  two  laser  beams  are  directed  at  the  structure  which  are  detected  in  the 
camera  images  to  triangulate  the  relative  distance  and  orientation  of  the  vehicle.  In 
[9],  Kondo  et  al.  use  a  light  stripe  to  illuminate  a  2D  profile  of  the  structure  which  is 
detected  in  the  camera  images.  A  common  feature  of  the  two  systems  developed  by 
Kondo  et  al.  is  the  use  of  active  lighting.  In  our  work  an  artificial  light  source  is  also 
used,  but  unlike  the  focused  beams  or  light  stripes  of  Kondo  et  al.,  the  light  source 
is  unfocused. 

Stolkin  et  al.  [13]  present  work  for  a  submarine  localising  from  a  leg  of  a  platform 
and  use  an  explicit  3D  model  of  the  structure,  projecting  the  model  onto  the  image 
plane  to  predict  the  shape  of  the  structure.  Model  based  tracking  is  an  attractive 
approach  for  this  application  as  the  form  of  the  structure  is  well  known  a  priori. 
Fig.  1  shows  the  basic  idea  behind  model  based  visual  tracking.  Synthetic  images 
are  generated  for  a  large  number  of  possible  robot  poses  and  each  of  these  images 
is  compared  with  the  actual  image  captured  by  the  robot.  The  comparator  can  take 
many  forms.  Examples  include  taking  the  pose  that  gives  the  best  match,  or  using  a 
multi-hypothesis  framework  such  as  a  particle  filter  [8]. 

In  the  wider  robotics  field,  model-based  tracking  has  received  much  attention. 
The  works  of  Gerard  and  Gagalowicz  [4],  Noyer  et  al.  [12]  and  Ho  and  Jarvis  [5] 
present  pose  estimation  systems  based  on  3D-surface  maps.  They  perform  corre¬ 
spondences  between  real  and  synthetic  images.  Both  Noyer  et  al.  [12]  and  Ho  and 
Jarvis  [5]  estimate  pose  with  a  probabilistic  particle  filter,  which  is  an  efficient 
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Pose  Estimate 


Fig.  1  An  overview  of  the  idea  of  model-based  visual  pose  estimation.  Multiple  synthetic  images 
(generated  at  different  possible  poses  of  the  AUV)  are  compared  with  the  real  camera  image  of  the 
scene. 


means  of  sampling  the  solution-space,  whereas  Gerard  and  Gagalowicz  [4]  present 
a  more  brute  force  evaluation  of  the  solution- space.  None  of  these  3D-surface  based 
methods  consider  reflectance  and  lighting  properties  in  their  work  -  they  only  use 
textured  3D  models  -  which  do  not  generalize  to  any  lighting  condition.  A  tex¬ 
tured  model  would  not  suffice  for  the  application  presented  in  this  paper,  because 
the  structure  is  made  of  one  material  and  is  therefore  essentially  texture-less.  The 
images  of  the  structure  are  also  highly  dependent  on  the  light  source,  indicating  that 
both  reflectance  properties  of  the  structure  and  a  light  model  should  be  known. 

The  work  of  Kee  et  al.  [7]  and  Blicher  et  al.  [1],  in  the  domain  of  face  identifi¬ 
cation,  introduce  the  idea  of  using  a  3D- surface  model  together  with  a  light  model. 
They  show  how  to  perform  face  identification  in  unknown  lighting  conditions  by 
first  estimating  the  current  light  source,  then  generating  synthetic  images  of  each 
face  model  using  the  estimated  light  source  model.  They  used  a  database  of  many 
different  3D-surface  face  models.  A  single  fixed  pose  of  the  faces  with  respect  to  the 
camera  was  assumed,  then  multiple  synthetic  face  hypothesis  images  were  matched 
to  the  real  image.  However,  in  our  work  there  is  a  single  3D- surface  map  of  the 
environment  (the  structure)  and  multiple  pose  hypotheses  that  are  matched  to  a  real 
image  (taken  from  the  AUV).  The  pose  hypotheses  with  the  best  image  match  to  the 
camera  image  from  the  robot  will  provide  the  pose  estimate.  This  idea  of  estimating 
and  incorporating  a  light  model  has  not  yet  been  applied  to  visual  localisation. 


3  Localisation  Framework 

Our  framework  uses  a  model  of  the  structure  and  a  model  of  light  source  together 
to  generate  synthetic  images  that  are  expectations  of  the  real  camera  images.  The 
synthetic  images  are  compared  to  the  real  images  to  estimate  the  pose  of  the  camera. 
The  pose  estimation  is  facilitated  in  a  probabilistic  multiple  pose  hypothesis  frame¬ 
work  -  a  particle  filter  -  which  uses  a  synthetic  image  of  the  structure  from  each 
pose  hypothesis  to  derive  a  comparison  score  against  the  real  image. 
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Fig.  2  The  AUV  has  a  forward  facing  camera  with  a  field-of-view  depicted  in  the  figure  by  pyramid 
viewing  volume.  The  spotlight  is  also  facing  forward  and  partially  illuminates  the  field-of-view  of 
the  camera  (depicted  by  the  inner  cone). 


A  forward  looking  spotlight  and  camera  are  mounted  rigidly  to  the  AUV  [3] 
which  is  inspecting  a  structure.  Fig.  2  shows  the  AUV,  the  coordinate  system,  the 
camera  and  spotlight- setup.  A  single  forward-facing  camera,  from  the  AUV’s  stereo 
pair,  is  used.  The  field-of-view  of  the  camera,  is  shown  as  pyramidal  viewing  vol¬ 
ume  in  Fig.  2.  The  extrinsic  pose  of  the  camera  is  calculated  with  respect  to  the 
vehicle,  and  is  facing  along  the  vehicle’s  positive  v  axis.  The  camera’s  intrinsic  pa¬ 
rameters  were  calibrated  using  the  OpenCV  library  [6].  The  camera  images  are  then 
undistorted  by  these  parameters,  and  the  model  of  the  structure  can  then  be  projected 
directly  onto  the  image  plane. 


3.1  Synthetic  Image  Generation 

The  synthetic  images  are  generated  from  a  polygon  mesh  of  the  structure.  The  mesh 
includes  the  surface  normal,  diffuse  and  specular  reflectance  properties.  Meshes  us¬ 
ing  such  detailed  surface  properties  have  rarely  been  applied  to  visual  localisation. 
These  photometric  properties  are  incorporated  into  the  Blinn-Phong  model  [2]  using 
the  OpenGL  library  for  generating  synthetic  images.  There  are  other  lighting  mod¬ 
els  which  could  be  also  used,  but  the  Blinn-Phong  model  is  chosen  for  its  simplicity, 
speed  of  computation  and  prevalent  implementation  on  most  graphics  processors.  In 
addition  to  the  pose  of  the  light,  the  model  incorporates  a  number  of  other  parame¬ 
ters  to  account  for  the  attenuation  by  water  and  angular  spread  of  the  light  source. 

The  structure  to  be  localised  against  is  comprised  of  three  steel  tubes,  linked 
together  by  smaller  rungs  at  approximately  45  degree  angles.  This  structure  is  mod¬ 
elled  as  a  polygon  mesh  of  each  of  the  tubes  by  defining  the  number  of  slices  and 
stacks  in  the  mesh  of  each  tube.  The  rendering  is  performed  with  per- fragment  light¬ 
ing  calculations,  and  shading  interpolation  to  the  pixels  within  the  fragment,  accord- 
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in g  to  the  light  model  defined  above.  A  series  of  tests  were  performed  to  calculate 
the  optimum  number  of  polygons  taking  into  account  render  times  and  model  ac¬ 
curacy.  For  this  model  and  using  an  Intel  dual  core  2. 3 3 GHz  CPU  and  a  NVIDIA 
Quadro  FX  350M  GPU  with  320x240  images,  it  was  found  that  optimum  value  of 
3120  polygons  equated  to  a  time  of  0.325ms  to  render  a  single  synthetic  image. 


3.2  Particle  Filter  Localisation 

The  use  of  a  particle  filter  is  described  in  detail  by  Thrun  et  al.  in  [14].  The  particle 
filter  is  a  set  of  N  pose  hypotheses  (particles)  Xt  =  x[l\x[2\x[^  . . .  ,x\N\  The  set  is 
sampled  from  the  previous  setXr_i  using  a  propagation  model  mt  and  a  correspond¬ 
ing  set  of  weights  (probabilities),  W.  The  weights  are  calculated  from  an  observation 
of  the  environment,  y,  as  follows: 

wln)=p(yk  l4"})  (D 

the  observation  of  the  environment  is  a  camera  image,  y,  which  is  compared  with 
each  pose  particle  v  by  rendering  a  synthetic  image.  The  measurement  of  probabil¬ 
ity  is  provided  from  an  image  matching  technique  (discussed  in  Section  3.3).  The 
concept  is  that  a  synthetic  image  generated  from  the  particles  nearest  the  correct 
pose  will  give  the  best  match  with  the  real  image.  These  particles  are  then  the  most 
likely  to  be  re-sampled  for  the  next  iteration.  The  current  pose  estimate  of  the  AUV 
is  extracted  from  the  filter  as  the  mean  pose  of  the  particles  with  the  highest  weights 
(top  5%).  We  use  roll  and  pitch  estimates  from  the  AUV’s  Inertial  Measurement 
Unit  (IMU)  as  each  particle’s  roll  and  pitch  estimate.  The  remaining  four  degrees 
of  freedom  are  propagated  using  a  constant  velocity  model  calculated  from  a  set  of 
previous  pose  estimates  extracted  from  the  particle  filter. 


3.3  Gradient-domain  Image  Matching 

The  comparison  process  between  the  camera  image  and  a  synthetic  image  provides 
a  likelihood  measure  for  the  set  of  particles  in  the  filter.  The  works  of  [1,  4,  7,  12], 
all  use  image  matching  techniques  which  compare  real  and  synthetic  images.  All  of 
the  techniques  are  variants  of  the  Mean  Absolute  Difference  (MAD). 

The  simple  image  matching  techniques  assume  that  it  is  possible  to  generate  pixel 
intensities  for  the  synthetic  image  that  are  equivalent  to  those  in  the  real  image.  This 
is  different  from  photo-realistic  rendering,  which  is  only  interested  in  making  the 
synthetic  image  appear  real.  Whereas,  these  image  intensity  matching  techniques 
require  the  environment  model  and  light  model  to  be  accurate  estimates  of  the  actual 
physical  properties  of  the  surrounding  environment.  The  parameters  of  such  models 
are  difficult  to  estimate  accurately.  Furthermore,  the  Blinn-Phong  image  formation 
model  [2],  employed  for  real-time  performance,  does  not  incorporate  the  ability  to 
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model  poor  visibility.  Suspended  tiny  particles  that  cause  poor  visibility  have  two  ef¬ 
fects  on  the  lighting;  absorbing  light  and  reflecting  light.  These  effects  would  need  to 
be  modelled  before  accurate  intensity  values  could  be  generated  in  the  synthetic  im¬ 
age.  It  is  difficult  to  generate  accurate  images  of  simulated  poor  visibility  conditions 
at  a  rate  quick  enough  for  this  framework.  For  this  reason,  intensity-based  matching 
is  not  used  and,  instead,  a  gradient-domain  image  matching  technique  is  developed. 
The  gradient-domain  removes  the  absolute  intensity  levels  whilst  capturing  the  sub¬ 
tle  shading  in  the  environment.  This  behaviour  is  different  to  an  edge-image  that 
identifies  drastic  boundaries  of  intensity. 

The  first  step  is  to  pass  the  real-image  through  a  Gaussian  filter,  which  removes 
the  effects  of  noise.  The  synthetic  and  real  images  are  then  both  passed  through  a 
horizontal  Sobel  operator  to  generate  gradient  images  in  both  the  x  and  y  directions; 
Gx  and  Gy  are  the  real  Sobel  images  and  gx  and  gy  are  the  synthetic.  Example  syn¬ 
thetic  and  real  images  are  shown  in  Fig.  3.  The  v  direction  is  shown  in  red  and  y  in 
green,  therefore  pixels  with  high  x  and  y  gradients  are  yellow.  To  compare  the  real 


1 
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Fig.  3  Top  Left:  Real  camera  image.  Top  Right:  Real  Sobel  image.  Bottom  Left:  Synthetic  image. 
Bottom  Right:  Synthetic  Sobel  Image.  Horizontal  gradient  is  shown  in  red  and  vertical  in  green; 
therefore  pixels  with  diagonal  gradients  are  yellow. 


and  synthetic  Sobel  images,  it  would  be  possible  to  turn  these  two  images  into  a 
gradient  magnitude  image  and  a  gradient  orientation  image,  which  would  enable  a 
more  logical  means  for  comparison.  But  to  avoid  the  expensive  square  root  and  arc 
tan  computations,  the  images  are  compared  directly  in  v  and  y  gradients. 

Firstly,  a  sum  is  taken  of  the  gradient  magnitude  in  the  real,  Sr,  and  synthetic,  Ss , 
images: 


Sr  = 


E(|G*(/?)|  +  |Gy(p)|) 


p= o 


N 


L(I&(p)I  +  Mp)I) 

P= 0 


(2) 
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where  N  is  the  number  of  pixels.  Secondly,  a  sum  of  the  difference  in  gradients 
between  real  and  synthetic  is  calculated  in  each  direction,  Dx,  Dy; 

Dx='£(\Gx(p)-gx{p)\)  Dy='£(\Gy{p)-gy{p)\)  (3) 

p= 0  p= 0 

the  final  image  matching  score  is  derived  by  subtracting  the  sum  of  the  gradient 
difference  from  the  sum  of  the  gradient  magnitude  and  normalizing  by  the  sum  of 
the  gradient  magnitude: 


n  _  (Sr  +  Ss)  (Dx-\-Dy)  //n 

~  5^  (4) 

This  result  equates  to  the  observation  y  and  the  particle  from  (1).  The  better  the 
match  between  the  images  the  larger  value  of  .  This  score  can  be  incorporated 
into  the  observation  y  and  the  particle  x^>  from  (1)  as  follows: 

W*n)  =  p(yk  )  oc  epD (5) 
Where  p  is  a  positive  constant  that  adjusts  the  convergence  of  the  particle  filter. 


4  Results 

An  experiment  was  conducted  at  night  in  clear  water  with  the  aim  of  determining  if 
the  visual  localisation  system  in  combination  with  the  IMU  could  localise  the  AUV 
as  it  moved  freely  in  all  six  degrees  of  freedom.  The  experiment  began  with  the  AUV 
approximately  1.5m  away  from  the  structure.  The  AUV  approached  the  structure, 
strafed  side  to  side,  descended  and  rotated  around  the  structure.  Note  that  there  was 
no  ground  truth  data  available  during  this  experiment  and  hence  the  performance  of 
the  system  could  only  be  checked  manually  by  inspecting  the  projected  centre  lines 
of  the  structure  from  the  estimated  pose,  and  confirming  they  align  correctly  in  the 
raw  camera  images.  Tracking  images  from  the  experiment  are  presented  in  Fig.  4, 
along  with  a  movie  of  the  results  can  be  found  in  the  video  attachment  located  at: 
http  ://www.  cat. csiro .  au/ict/download/nuske/auv_pooltest  1  .mpg 
The  visual  localisation  system  maintained  accurate  track  of  the  structure  for  440 
frames  where  there  were  significant  changes  in  scale,  orientation  and  translation. 
The  system  then  made  a  mistake  when  one  of  the  columns  of  the  structure  disap¬ 
peared  behind  another,  and  then  reappeared  on  the  other  side.  The  system  estimated 
the  column  reappearing  on  the  same  side,  and  did  not  recover  from  this  error.  The 
frames  just  before  and  just  after  the  disappearing  column  can  be  seen  as  the  bottom 
two  images  of  Fig.  4.  When  the  system  was  run  again,  and  again  over  the  same  data, 
it  did  occasionally  correctly  estimate  that  the  rear  column  appeared  on  the  other 
side.  However,  it  failed  more  times  than  it  succeeded  and  a  solution  to  this  problem 
is  currently  being  investigated. 


Stephen  Nuske,  Jonathan  Roberts,  David  Prasser  and  Gordon  Wyeth 


Fig.  4  Images  showing  the  tracking  of  the  oil  rig  structure.  Real  camera  image  is  overlaid  with  the 
centre  lines  of  the  structure  projected  from  the  estimated  pose.  Bottom  left  corner  of  each  image 
is  a  synthetic  rendering  of  the  structure.  The  frame  number  is  located  in  the  top  left  comer  of  each 
image.  Data  was  collected  at  15Hz. 
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Algorithmic  speed  and  efficiency  are  critical  to  ensure  a  practical  system  for  the 
target  application.  There  is  a  demand  to  use  low  power  processors  on  AUVs  due  to 
the  on  board  power  limitations  of  the  vehicles.  The  results  reported  above  were  pro¬ 
cessed  off-line.  However,  the  algorithm  has  been  implemented  in  a  such  a  way  as  to 
make  it  amenable  to  on-board  impementation.  The  resolution  of  the  3D-surface  was 
minimised,  and  the  image  processing  and  matching  algorithms  were  implemented 
on  a  GPU,  which  all  brought  the  processing  times  much  closer  to  real-time  rates. 
The  system  was  run  on  a  laptop  computer,  which  used  small  mobile  processors 
and  graphics  hardware.  It  achieved  a  frame  rate  of  0.5Hz  with  800  particles  using 
320x240  sized  images  and  would  therefore  require  the  AUV  to  be  moving  slowly 
with  respect  to  the  structure  to  be  able  to  track  it  reliably  in  real  time. 

The  two  potential  methods  to  achieve  higher  efficiencies,  would  be  to  further 
reduce  the  render  times  of  the  synthetic  images,  and  also  to  reduce  the  number  of 
particles  in  the  filter.  Reducing  the  render  time  could  involve  further  reductions  in 
the  polygon  counts,  only  passing  sections  of  the  model  that  are  in  view  to  the  graph¬ 
ics  pipeline,  optimising  the  lighting  calculations  or  with  improved/multiple  GPUs. 
Future  improvements  to  reduce  the  number  of  particles  could  include  using  a  two- 
stage  coarse-to-fine  particle  filter,  such  as  used  in  the  work  of  Klein  and  Murray  [8], 
or  to  develop  a  better  propagation  model.  The  depth  sensor  and  the  magnetic  com¬ 
pass  are  two  sensors  which  could  be  included  in  the  propagation  model.  However, 
it  would  need  to  be  confirmed  that  these  sensors  are  locally  consistent  in  the  desired 
environment  (that  is,  if  their  inter-frame  motion  estimates  are  accurate).  Another 
possible  method  of  improving  the  propagation  model  is  to  use  a  visual  odometry 
system.  Marchand  et  al.  [11]  present  such  an  approach. 


5  Conclusion  and  Future  Work 


We  have  presented  a  visual  localisation  system  that  explicitly  models  the  spotlight 
of  an  AUV  navigating  underwater  structures.  The  light  model  is  used  in  conjunction 
with  a  surface  model  of  the  structure  to  generate  synthetic  images  that  are  accurate 
representations  of  the  real  camera  image.  A  particle  filter  framework  is  employed 
where  a  synthetic  image  is  rendered  from  each  pose  hypothesis  and  a  observation 
function  computes  a  probability  through  comparison  with  the  real  camera  image. 
The  observation  function  compares  the  real  image  with  the  synthetic  images  oper¬ 
ates  in  the  intensity  gradient  domain,  avoiding  the  need  to  generate  precise  inten¬ 
sity  values  in  the  synthetic  image  and  allows  the  system  to  operate  in  poor  visi¬ 
bility  conditions  which  are  difficult  to  replicate  in  the  synthetic  image.  The  system 
was  tested  using  a  monocular  vision  system,  spotlight,  steel  structure  and  low-cost 
IMU.  Results  show  that  the  system  can  localise  the  vehicle  in  challenging  image 
sequences  where  the  light  source  is  constantly  moving  and  illuminating  the  scene 
non-uniformly. 

In  future  work  the  system  will  continue  to  be  developed  with  the  goal  of  a  fully 
functioning  system  in  the  targeted  offshore  environments.  Localising  from  struc- 
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tures  of  other  shapes  and  surface  characteristics  will  be  evaluated.  The  image  pro¬ 
cessing  algorithms  presented  here  are  essentially  generic  and  are  expected  to  be  able 
to  provide  similar  results  from  other  structures.  Improvements  to  the  lighting  model 
will  also  be  investigated,  such  as  modelling  the  spotlight  as  an  area  light  source  and 
also  using  a  more  accurate  model  of  the  surface  reflectance  properties  to  generate 
images  with  more  precise  representation  of  the  shading.  More  accurate  odometry 
information  will  also  be  employed  which  may  come  from  a  compass,  a  pressure 
(depth)  sensor  or  potentially  a  visual  odometry  algorithm.  This  information  is  ex¬ 
pected  to  greatly  improve  the  accuracy  and  computational  efficiency  of  the  system 
by  significantly  reducing  the  area  of  the  state  space  that  must  be  evaluated.  Ambigu¬ 
ous  visual  scenarios  which  have  been  the  cause  of  divergence  in  the  localisation  filter 
will  also  be  investigated  in  more  detail. 
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Summary.  We  present  a  “leap-frog”  path  designed  for  a  team  of  three  robots  per¬ 
forming  cooperative  localization.  Two  robots  act  as  stationary  measurement  beacons 
while  the  third  moves  in  a  path  that  provides  informative  measurements.  After  com¬ 
pleting  the  move,  the  roles  of  each  robot  are  switched  and  the  path  is  repeated. 
We  demonstrate  accurate  localization  using  this  path  via  a  coverage  experiment  in 
which  three  robots  successfully  cover  a  20m  x  30m  area.  We  report  an  approximate 
positional  drift  of  1.1m  per  robot  over  a  travel  distance  of  140m.  To  our  knowledge, 
this  is  one  of  the  largest  successful  GPS-denied  coverage  experiments  to  date. 


1  Introduction 

Localization  is  critical  for  the  navigational  aspect  of  many  robotic  applica¬ 
tions.  Without  accurate  positioning,  a  mobile  robot  would  get  lost,  wander 
away  from  its  target  workspace,  and  fail  to  complete  its  intended  task.  Addi¬ 
tionally,  there  are  many  situations  where  an  external  positioning  system,  such 
as  GPS,  is  unavailable  to  the  robot,  e.g.  indoors,  within  dense  vegetation,  and 
underwater.  To  solve  the  localization  problem,  a  team  of  robots  can  employ 
cooperative  localization  [1]  to  incorporate  relative  sensor  measurements  into 
a  Kalman  filter  framework  that  estimates  the  pose  of  the  robots. 

It  can  be  shown  that  the  accuracy  of  such  a  filter  is  dependent  upon  the 
path  the  robots  take.  This  is  due  to  the  fact  that  certain  measurements  are 
more  informative  than  others,  depending  on  the  vantage  point  of  the  sensor. 
We  believe  a  “leap-frog”  path,  as  in  [2,  3],  is  desirable  because  it  temporarily 
grounds  the  increasing  uncertainty  of  the  system  via  stationary  robots. 

The  contribution  of  this  work  is  the  introduction  of  a  new  leap-frog  path 
designed  to  produce  informative  measurements  for  three  robots  performing 
cooperative  localization.  We  also  report  a  20m  x  30m  large-scale  GPS-denied 
coverage  experiment  with  three  robots  (see  Fig.l)  that  was  only  possible  after 
the  gain  in  positioning  accuracy  provided  by  this  new  leap-frog  strategy. 
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Fig.  1.  Three  robots  used  for  experimental  evaluation  of  the  proposed  leap-frog 
localization  strategy. 

2  Related  Work 

The  majority  of  recent  work  on  cooperative  localization  ignores  the  path 
planning  aspect  of  this  topic  and  instead  focuses  on  filtering.  In  [1],  Roume- 
liotis  et.  al.  define  a  Kalman  filter  framework,  similar  to  our  formulation  in 
Sec.  3,  that  can  be  used  for  cooperative  localization.  In  [4],  they  iterate  on 
this  work  and  present  a  distributed  method  for  computing  the  Kalman  filter. 
Finally,  the  work  in  [5,  6]  studies  the  growth  of  uncertainty  under  different 
sensing  modalities  and  with  a  varying  number  of  robots. 

Some  recent  work  has  addressed  path  planning  and  control  for  robots 
performing  cooperative  localization.  Hidaka  et.  al.  [7]  present  derivations  to 
show  that  with  any  number  of  robots,  the  optimal  formation  for  accurate 
localization  is  a  “packed  circles”  configuration.  For  three  robots,  they  claim 
that  the  optimal  formation  is  an  equilateral  triangle.  Trawny  et.  al.  [8],  on 
the  other  hand,  perform  optimization  over  possible  multi-robot  paths  and 
demonstrate  a  performance  improvement  in  simulation.  Although  the  opti¬ 
mization  is  beneficial,  we  believe  this  method  is  susceptible  to  local  minima. 

Finally,  some  research  has  involved  the  investigation  of  leap-frog  paths 
for  cooperative  localization.  Navarro- Serment  et.  al.  [2]  use  a  group  of  small 
heterogeneous  robots  (Millibots)  for  localization  and  mapping.  The  authors 
use  leap-frog  paths  to  help  maintain  a  better  estimate  over  the  occupancy 
grid  map  and  their  EKF  localization.  Kurazume  [3]  introduces  leap-frog  paths 
for  cooperative  localization  as  well,  with  a  path  that  is  designed  to  represent 
triangle  chains  of  different  configurations.  Although  these  paths  prove  to  be 
accurate  solutions  for  localization,  Kurazume  does  not  analyze  these  paths 
from  an  information  theoretic  standpoint. 


3  Cooperative  Localization 

To  reduce  motion  error,  a  team  of  robots  can  employ  cooperative  localization , 
which  improves  the  estimate  of  the  state  via  relative  sensor  measurements 
between  robots.  This  type  of  filtering  method  can  be  implemented  with  an 
extended  Kalman  filter,  as  in  [1]. 
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The  state  vector  for  the  filter  Xk  is  defined, 

=  [  xf  xlT  .  .  .  Xf-lT  ]T  xt  =  [  4  yi  ei  }T , 

where  N  is  the  number  of  robots  and  Xk  represents  the  state  of  the  i-th 
robot  at  time  step  k.  In  this  formulation,  (xk,yk)  represents  the  position  of 
the  i-th  robot  and  0k  represents  that  robot’s  heading. 

The  state  process  equation  f(Xk,ulk )  is  based  on  a  unicycle  model, 

~xk  +  vk  cos  OkAt 

f(xlk,  4)  =  4  +  4 sin  eiAt  4  = 

ji+uiAt 

where  uk  is  a  motion  input  for  the  i-th  robot,  which  is  composed  of  a  trans¬ 
lational  velocity  v\  and  a  rotational  velocity  uj%k. 

The  measurement  equation  for  our  state  is  a  bearing-only  measurement, 

hlj(Xk)  =  arctan  f— — —  ^  —  Oj 
\Xi  —  Xj  J 

which  represents  the  relative  bearing  angle  to  the  i-th  robot  as  measured  by 
the  j-th  robot.  A  typical  sensor  that  provides  bearing  measurements  in  this 
form  is  a  monocular  camera. 

The  purpose  of  the  extended  Kalman  filter  (EKF)  is  to  recursively  esti¬ 
mate  the  state  mean  and  covariance  matrix  with  two  stages:  the  prediction 
step,  which  produces  the  estimated  mean  and  covariance,  Xk+i\k  and  Pk+i\k 
respectively,  as  well  as  the  measurement  update  step,  which  produces  an 
update  to  the  estimated  mean  and  covariance,  Xk\k  and  Pk\k  respectively. 
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The  matrix  Uk  is  a  covariance  matrix  with  2x2  matrices  along  its  diagonal 
(all  Uk  for  0  <  i  <  N—l  as  defined  below). 


Ul 


a\vi\  0 

0  /3|4l+7l<l 


(1) 


Matrix  Uk  represents  the  covariance  matrix  for  the  additive  white  Gaussian 
noise  that  is  expected  to  perturb  robot  V s  motion  input  u\.  Conventional 
implementations  use  a  static  covariance  for  this  purpose,  but  we  believe  a 
velocity  dependent  noise  model  is  more  accurate.  The  model  in  Eq.  1  accounts 
for  the  fact  that  wheel  slippage  is  more  pronounced  at  higher  speeds  and  that 
zero  additive  noise  should  be  expected  when  the  robots  are  stationary. 


3.2  Measurement  Update  Step 


To  properly  incorporate  the  information  provided  by  the  bearing  sensors,  we 
perform  a  correction  to  the  predicted  state  estimate, 


H)  = 


dh){Xk  ifc-i) 
dXk 


K  =  Pklk_1HT(HPk[k_1HT  +  R)~ 
Xk\ k  =  -\ /,•  k  i  +  K(zk  —  h(Xk |fc_i)) 

Pk\k  Pk\k  1  K  P  Pk\k  -  1 


(2) 


where,  for  M  bearing  measurements,  K  is  the  Kalman  gain,  H  is  the  mea¬ 
surement  Jacobian,  and  R  is  an  Mxilf  matrix  with  diagonal  elements  az  (the 
variance  associated  to  a  single  measurement).  The  Jacobian  H  is  constructed 
by  appending  together  all  row  vectors  H, •  for  each  measurement  between  a 
robot  i  and  another  robot  j.  Likewise,  h  is  constructed  by  appending  together 
all  hl-  to  form  a  column  vector.  Zk  is  the  measurement  vector  to  which  h  is 
associated. 

It  is  important  to  realize  that  the  effectiveness  of  this  cooperative  local¬ 
ization  filter  is  dependent  upon  the  path  of  the  robots.  This  can  be  seen 
in  Eq.  2  where  a  positive  definite  matrix  KHPk\k_1  is  subtracted  from  the 
predicted  covariance  Pk \k~i-  Since  H  is  dependent  upon  the  state  estimate 
the  reduction  in  uncertainty  via  subtracting  KHPk\k_i  will  vary  in 
amount  depending  on  the  configuration  of  the  robots. 


4  Leap-Frog  Path  Design 

In  Sec.  3,  we  discuss  how  the  effectiveness  of  the  cooperative  localization 
filter  is  dependent  upon  the  path  of  the  robots.  This  suggests  that  by  careful 
path  planning,  we  can  achieve  better  position  accuracy  during  experiments. 
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As  in  [2,  3],  we  suggest  the  use  of  a  “leap-frog”  path  for  a  team  of  robots, 
where  at  any  given  time,  a  subset  of  the  robots  temporarily  act  as  stationary 
measurement  beacons  while  the  other  robots  are  in  motion. 

A  “leap-frog”  path  is  intuitively  beneficial  for  the  Kalman  filter  because 
when  robots  are  stationary,  they  will  not  gain  any  positioning  noise,  thus 
temporarily  grounding  the  normally  increasing  uncertainty  of  the  system.  A 
moving  robot  can  move  around  at  will  without  concern  for  its  added  predic¬ 
tion  noise  because  it  can  easily  visit  the  nearby  stationary  robots  to  drive  its 
position  uncertainty  down  to  their  level  via  relative  sensor  measurements. 

4.1  Three-Robot  Path  Design  via  Information  Gain 

The  use  of  three  robots  for  localization  is  a  good  fit  for  bearing-only  measure¬ 
ments  because  the  intersection  of  two  bearing  rays  from  two  different  robots 
will  triangulate  the  location  of  a  third  robot,  albeit  with  error  due  to  noise. 
To  investigate  path  design  for  a  team  of  three  robots,  we  consider  the  mea¬ 
surement  update  equation  for  the  information  filter,  which  is  a  dual  to  the 
Kalman  filter  and  is  commonly  used  in  localization  and  mapping  algorithms, 
such  as  [9].  The  measurement  update  is  as  follows, 


h\k  —  h\k-i  +  HtR  1H , 


where  the  information  matrix  Ik\k  =  Pk ^  is  the  inverse  of  the  covariance 
matrix.  In  this  work,  we  define  the  information  gain  G(Xk )  as  a  norm  of 
the  positive  definite  matrix  that  is  added  to  the  information  matrix  during 
a  measurement  update, 


G(Xk)  =  tr  (iLTR_1iL)  . 


The  information  gain  depends  on  the  state  Xk  through  the  measurement 
Jacobian  H.  We  argue  that  states  producing  a  larger  information  gain  will 
offer  measurements  that  are  more  informative  to  the  Kalman  filter. 


(0 , d/2)  £7}^ 


(x,y) 


£3 


Fig.  2.  Stationary  robots  0  and  1  are  a  distance  d/2  away  from  the  x-axis.  An 
analysis  of  the  information  gain  is  used  to  obtain  the  appropriate  leap-frog  path 
for  robot  2. 
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To  investigate  the  path  optimization  problem  for  three  robots,  we  consider 
the  situation  in  Fig.  2  where  two  robots  (0  and  1)  lie  stationary  on  the  y- axis 
an  equal  distance  away  from  the  x-axis  (with  a  separation  distance  d).  The 
third  robot  (robot  2)  acts  as  the  moving  robot  in  this  leap-frog  strategy. 

For  any  pose  (x,  y ,  0)  of  robot  2  in  Fig.  2,  the  filter  will  have  the  following 
information  gain  G(Xk), 


Xk  = 


G(Xk)  = 


0-00  — -  0  x y 0 
2  2 


-2 


6  +  4, 


d 2  (y  —  d/2)2  +  x 2 


_ 1 _ ) 

(y  +  d/2)2  +  x2  ) 


To  determine  the  optimal  y  value  for  any  x  position  of  robot  2,  we  can  take 
the  derivative  of  the  information  gain  with  respect  to  y,  as  follows, 


9G(Xk)  =  ga2  f  y-d/2  y  +  d/2  \ 

9y  z  \  ((y  -  d/2)2  +  x2)2  ((y  +  d/2)2  +  x2)2  ) 

By  setting  the  derivative  to  zero,  we  can  find  the  y  that  maximizes  the 
information  gain.  The  solution  is  y  =  0,  independent  of  the  robot’s  x  position. 
This  implies  that  for  a  robot  that  is  “leaping”  past  the  two  stationary  robots 
along  the  direction  of  the  x-axis,  the  optimal  trajectory  is  for  the  robot  to 
trace  the  x-axis  itself,  with  position  y  =  0  throughout  the  path,  and  pass 
through  the  other  two  robots.  This  can  be  generalized  for  any  position  of 
robots  0  and  1  in  the  plane:  the  trajectory  of  robot  2  should  move  along 
the  equidistant  path  between  the  two  stationary  robots  to  achieve  maximum 
information  gain. 


1  3  5  7  9 


Fig.  3.  This  is  the  leap-frog  path  we  use  for  simulations  and  experiments,  which 
is  based  on  an  analysis  of  information  gain  for  a  team  of  three  robots. 


We  introduce  a  new  three  robot  leap-frog  path  in  Fig.  3  to  build  off  this 
result.  To  our  knowledge,  this  is  the  only  path  for  which  the  moving  robot,  at 
every  time  step,  will  trace  the  equidistant  path  between  the  stationary  robots. 
The  implementation  of  this  path  involves  the  trailing  robot  of  an  equilateral 
triangle  configuration  to  pass  through  the  stationary  robots,  establishing  a 
new  position  and  a  new  equilateral  triangle  configuration  on  the  other  side. 
The  robots  then  switch  roles  and  repeat  the  sequence. 
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4.2  Empirical  Results 

To  test  the  generated  path  displayed  in  Fig.  3,  we  have  performed  a  series 
of  Monte  Carlo  simulations  for  three  mobile  robots  performing  cooperative 
localization.  The  robots  (when  moving)  are  instructed  to  drive  at  a  constant 
0.5  m/s  with  a  =  0.006,  (3  =  0.02,  and  7  =  0.003  for  the  motion  noise 
model  in  Eq.  1.  Relative  bearing  measurements  are  obtained  at  10  Hz  and 
are  assumed  to  have  additive  Gaussian  noise  with  a  standard  deviation  of  1 
degree. 

In  Fig.  4,  we  compare  the  results  of  the  Monte  Carlo  simulations  for  three 
different  paths.  Path  (a)  is  a  smoothed  version  of  the  leap-frog  path  designed 
in  Sec.  4,  path  (b)  is  a  trajectory  obtained  when  the  robots  move  in  an 
equilateral  triangle  formation,  and  path  (c)  is  the  same  as  (b)  but  omits  the 
measurements. 


a) 


Fig.  4.  Path  (a)  represents  the  leap-frog  path  presented  in  Sec.  4,  path  (b)  rep¬ 
resents  the  optimal  formation  for  localization,  and  path  (c)  represents  odometry 
only.  The  plot  depicts  the  trace  of  the  sample  covariance  matrix  generated  for  a 
collection  of  1000  Monte  Carlo  simulations. 


Each  path  was  simulated  for  1000  different  trials  with  randomly  generated 
noise  for  measurements  and  motion.  While  the  estimate  of  the  state  for  each 
trial  follows  the  intended  path  due  to  feedback  control,  the  actual  state  for 
each  trial  is  affected  by  the  noise  and  drifts  from  the  path.  We  measure  the 
filter  performance  by  observing  the  distribution  of  the  robot  state  over  all 
trials.  A  larger  spread  of  data  points  implies  worse  tracking  of  the  actual  state. 
To  quantify  the  performance,  we  compute  the  trace  of  the  sample  covariance 
matrix  for  the  actual  robot  state  computed  over  the  1000  simulated  trials. 
Fig.  4  shows  a  graph  of  this  metric  for  each  of  the  path  types.  We  note  that 
our  leap-frog  path  outperforms  the  optimal  formation. 


5  Experimental  Evaluation 

The  motivating  application  for  this  work  is  GPS-denied  autonomous  cov¬ 
erage,  for  which  accurate  positioning  is  of  critical  importance.  We  apply  a 
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smoothed  version  of  the  path  depicted  in  Fig.  3  to  a  team  of  real  robots 
performing  coverage. 


5.1  Experimental  Setup 

We  use  three  robots  for  outdoor  localization  experiments,  each  of  which  is 
based  on  the  Learning  Applied  to  Ground  Vehicles  (LAGR)  platform  [10]. 
Each  mobile  robot  has  three  on  board  computers,  wheel  encoders  for  odom- 
etry,  and  a  set  of  four  stereo  cameras  mounted  above  the  chassis.  We  choose 
to  treat  the  4  stereo  pairs  as  8  individual  bearing  sensors  in  order  to  reduce 
the  computational  load.  The  filter  is  implemented  according  to  Sec.  3  and  is 
centralized  (meaning  that  only  one  of  the  robots  is  running  the  Kalman  filter 
at  any  given  time).  The  robots  measure  bearing  to  each  other  by  detecting 
large  red  spheres  in  the  camera  images  with  a  circle  Hough  transform  [11]. 
See  Fig.  1  for  a  photograph  of  the  robots  in  their  experimental  configurations. 


5.2  Coverage  Experiments 


Fig.  5.  An  experiment  on  the  football  field  at  Gesling  Stadium  at  Carnegie  Mellon 
University.  The  three  photos  here  are  extracted  from  a  video  sequence  used  to 
record  the  ground  truth  position  of  the  robots  throughout  the  experiment. 


The  photos  in  Fig.  5  are  from  a  video  sequence  recorded  during  one  of 
our  coverage  experiments  at  Gesling  stadium  at  Carnegie  Mellon  University. 
We  are  able  to  use  this  video  sequence  to  post-process  ground  truth  position 
data  for  each  of  the  three  robots.  In  order  to  do  this,  we  compute  a  camera 
projection  matrix  based  on  known  3D  points  in  the  image  (the  markings  on 
the  football  field).  Then,  after  manually  selecting  a  robot’s  location  in  the 
image  plane,  we  can  infer  its  3D  position  via  its  projection  onto  the  plane. 

The  estimated  path  of  the  robots  during  the  aforementioned  coverage 
experiment  is  drawn  in  Fig.  6  (a).  This  estimated  path  follows  an  overall 
desired  path  that  is  composed  of  multiple  smoothed  versions  of  the  leap-frog 
path  displayed  in  Fig.  3  pasted  together  so  as  to  sweep  a  region  for  coverage. 
The  travel  distance  for  each  robot  was  approximately  140m  during  the  ex¬ 
periment.  The  covered  area  was  20m  x  30m.  Ground  truth  points  are  shown 
in  Fig.  6  for  comparison  and  to  help  quantify  the  localization  performance. 
Fig.  6  (b)  shows  the  odometry-only  estimate  of  the  path.  It  is  worth  noting 
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Fig.  6.  (a)  shows  the  estimated  trajectory  of  the  robots  during  a  coverage  ex¬ 
periment  along  with  ground  truth  points  (yellow  circles).  The  final  ground  truth 
position  is  displayed  with  yellow  stars,  (b)  shows  the  same  experiment  with  a  filter 
that  ignores  the  bearing  measurements  (dead  reckoning  only). 


how  erroneous  this  path  is  (most  likely  due  to  turning  biases  from  unequally 
inflated  tires)  and  how  effective  the  filtering  is  in  correcting  the  erroneous 
path  to  agree  closely  with  the  ground  truth  data. 

The  true  final  position  of  the  three  robots  is  also  depicted  in  Fig.  6.  The 
approximated  error  between  the  filtered  estimate  and  the  measured  final 
ground  truth  pose  is:  1.09  meters  for  robot  0  (the  red  trajectory  in  Fig.  6), 
1.01  meters  for  robot  1  (the  green  trajectory  in  Fig.  6)  and  1.15  meters  for 
robot  2  (the  blue  trajectory  in  Fig.  6).  We  note  that  the  accuracy  of  these 
ground  truth  measurements  is  subject  to  possible  user  error  when  manu¬ 
ally  selecting  the  image  points  that  correspond  to  the  robots  in  the  video 
sequence. 

The  localization  accuracy  for  this  experiment  is  quite  remarkable  for  this 
type  of  outdoor  robot.  The  presence  of  wheel  slippage  coupled  with  a  difficult 
terrain  can  cause  severe  drift  in  the  odometry  estimate  over  a  path  this 
long.  Additionally,  the  measurements  that  we  acquire  with  vision  can  be 
fairly  noisy  compared  to  more  expensive  laser  range  finders.  But  when  an 
informative  path,  such  as  the  one  we  present  in  Sec.  4,  is  used,  the  accuracy 
improves  significantly,  as  shown  in  our  experiment. 


6  Conclusion 

This  work  presents  a  leap-frog  path  designed  to  aid  localization  for  a  team  of 
three  robots.  The  path  is  designed  such  that  the  moving  robot  travels  along  a 
path  that  adds  maximal  information  to  the  filter.  The  resulting  path  outper¬ 
forms  the  optimal  formation-based  path.  The  experiment  that  we  describe 
is,  to  our  knowledge,  one  of  the  largest  outdoor  GPS-denied  coverage  results, 
successful  in  part  because  of  precise  localization. 
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Although  we  believe  the  absolute  optimal  path  (in  terms  of  localization 
accuracy)  for  a  team  of  three  robots  would  involve  a  leap-frog  motion  strategy, 
the  path  we  introduce  in  this  paper  is  most  likely  not  optimal.  Precisely 
defining  the  optimal  path  is  still  an  open  problem,  which  may  require  running 
an  exhaustive  simulation  to  optimize  over  all  possible  combinations  of  motion 
inputs:  a  task  that  would  be  computationally  infeasible. 

Also,  this  paper  has  focused  on  developing  paths  for  a  team  of  three 
robots.  We  believe  that  a  three  robot  team  is  a  good  fit  for  applications  that 
require  accurate  positioning,  in  part  because  three  robots  can  provide  proper 
triangulation.  That  said,  it  is  always  beneficial  to  add  additional  information 
to  the  Kalman  filter,  and  a  way  to  do  this  would  be  to  add  additional  robots. 
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A  Location-Based  Algorithm  for 
Multi-hopping  State  Estimates 
within  a  Distributed  Robot  Team 
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Abstract  Mutual  knowledge  of  state  information  among  robots  is  a  cru¬ 
cial  requirement  for  solving  distributed  control  problems,  such  as  coverage 
control  of  mobile  sensing  networks.  This  paper  presents  a  strategy  for  ex¬ 
changing  state  estimates  within  a  robot  team.  We  introduce  a  deterministic 
algorithm  that  broadcasts  estimates  of  nearby  robots  more  frequently  than 
distant  ones.  We  argue  that  this  frequency  should  be  exponentially  propor¬ 
tional  to  an  importance  function  that  monotonically  decreases  with  distance 
between  robots.  The  resulting  location-based  algorithm  increases  propaga¬ 
tion  rates  of  state  estimates  in  local  neighborhoods  when  compared  to  simple 
flooding  schemes. 


1  Introduction 

Robots  in  a  team  need  to  communicate  state  estimates  to  self-organize.  Since 
many  applications  desire  the  team  to  spread  over  large-scale  domains,  result¬ 
ing  distances  between  robots  can  become  larger  than  their  capable  peer-to- 
peer  transmission  ranges.  These  configurations  require  multi-hop  network¬ 
ing  to  distribute  state  information  over  the  entire  system.  To  facilitate  the 
transportation  of  data  packets  in  a  multi-hop  fashion,  many  mobile  ad  hoc 
networks  implement  sophisticated  routing  schemes.  Due  to  the  mobile  nature 
of  such  networks,  these  schemes  consume  a  significant  amount  of  communi¬ 
cation  capacity  for  maintaining  knowledge  about  network  topology.  While 
some  routing  strategies  take  spatial  configurations  into  account,  the  robots 
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are  agnostic  to  the  relevance  of  the  actual  data  being  transferred.  There  is  no 
concept  of  data  importance  from  the  robots’  point  of  view,  often  resulting  in 
the  suboptimal  allocation  of  communication  resources  (e.g.  time,  bandwidth, 
power)  to  transfer  packets. 

The  strategy  in  this  paper  allows  robots  to  better  manage  communication 
resources  for  relaying  state  estimates.  Since  the  collaboration  of  robots  takes 
place  in  the  physical  world,  spatial  relationships  between  robot  states  can 
give  insight  into  the  importance  of  transferring  each  estimate.  This  location- 
based  approach  gives  a  quantitative  answer  to  the  question:  how  important 
is  it  for  one  robot  to  broadcast  state  information  about  another  robot?  We 
represent  the  importance  of  transmitting  a  state  estimate  as  a  function  that 
is  inversely  proportional  to  the  distance  between  robots. 

From  this  importance  function  we  develop  a  deterministic  algorithm  that 
ensures  state  estimates  propagate  throughout  a  robot  network.  The  proposed 
location-based  algorithm  is  efficient  in  terms  of  bandwidth  and  computational 
complexity;  it  does  not  require  network  topology  information  to  be  transmit¬ 
ted  or  computed.  We  used  Monte  Carlo  simulations  to  show  increased  propa¬ 
gation  rates  of  state  estimates  in  local  neighborhoods.  Then  with  real  control 
and  wireless  hardware,  we  simulated  a  nine  robot  team  running  a  Voronoi 
coverage  controller  to  show  the  algorithm’s  effectiveness  in  solving  distributed 
control  problems.  Experimental  results  for  the  propagation  of  state  estimates 
are  also  presented  with  five  AscTec  Hummingbird  quad-rotor  flying  robots 
and  four  stationary  robots. 

A  substantial  body  of  work  exists  on  location-based  routing  for  mobile  ad 
hoc  networks.  Haas  proposed  a  zone-based  routing  protocol  using  a  radius 
parameter  to  reduce  the  number  of  control  messages  [4].  Ni  et  al.  developed  a 
distance-based  scheme  to  decide  when  a  node  should  drop  a  rebroadcast  [5], 
while  Sun  et  al.  adapted  a  similar  scheme  for  setting  defer  times  [7].  Ying  et 
al.  discussed  how  these  ad  hoc  schemes  influence  flooding  costs  [1]. 

Our  proposed  algorithm  is  related  to  this  body  of  work  in  that  location  is 
used  to  broadcast  information  through  a  mobile  ad  hoc  network.  However, 
instead  of  routing  actual  data  packets  to  a  predetermined  receiver,  we  are 
deterministically  transmitting  state  information  to  be  used  by  the  entire  team 
of  robots.  This  allows  all  transmissions  to  be  treated  as  simple  broadcasts,  for 
which  the  sender  uses  the  algorithm  to  select  state  estimates.  This  strategy 
is  applicable  for  many  distributed  control  problems,  such  as  coverage  control 
algorithms  for  mobile  sensing  networks. 


2  Importance  of  Broadcasting  State  Estimates 

A  common  assumption  for  distributed  control  algorithms  is  that  robots  have 
access  to  state  estimates  of  other  nearby  robots.  This  assumption  is  often 
translated  into  unrealistic  requirements  on  communication  range.  The  most 
common  requirement  is  that  estimates  need  to  be  directly  shared  between 
robots  that  are  within  a  specified  distance.  Another  common  requirement  is 
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Fig.  1  A  simple  example 
where  robots  i  and  j  share 
a  Voronoi  boundary  but 
cannot  communicate  their 
state  estimates  directly. 
This  problem  is  easily 
resolved  using  a  mobile 
ad-hoc  network  topology  to 
route  information  through 
robot  k. 


Comm 

region 


Voronoi 

region 


for  information  to  be  shared  between  robots  of  a  defined  spatial  relationship 
(e.g.  adjacent  Voronoi  regions  [2]  or  overlapping  fields  of  view  [6]). 

These  communication  requirements  are  too  simplistic  to  be  realized  in 
practice.  Actual  network  topologies  depend  on  more  than  simple  distance 
criteria,  such  as  environment  geometry,  channel  interference,  or  atmospheric 
conditions.  Even  if  transmission  ranges  are  ideal  in  the  physical  sense  (e.g. 
the  ideal  disk  model),  spatial  relationships  for  certain  distributed  controllers 
cannot  guarantee  peer-to-peer  connectivity.  Figure  1  shows  a  configuration 
where  a  direct  communication  link  cannot  be  created  between  the  Voronoi 
neighbors  i  and  j.  Moreover,  robots  that  are  spatially  disconnected  may  de¬ 
cide  not  to  route  state  estimates  to  one  another.  If  they  move  to  become 
spatially  connected,  the  lack  of  shared  data  will  prevent  the  robots  from 
learning  about  their  new  neighbors.  Thus,  no  new  communication  links  will 
be  established.  We  are  motivated  by  these  serious  and  unavoidable  compli¬ 
cations  to  develop  an  algorithm  that  ensures  state  estimates  flow  throughout 
a  team  of  robots. 


2.1  Broadcast  Scheme 

Consider  n  robots  moving  in  a  space1,  V.  Each  robot,  i  E  {1, . . . ,  n},  knows 
its  current  state,  Pi(t)  E  P,  by  some  means  of  measurement  (e.g.  GPS  or 
visual  localization).  We  propose  that  each  robot  maintains  a  list  of  state 
estimates,  [pi(tii), . . .  ,Pn(Un)]i  where  Uj  denotes  a  time  stamp  at  which  robot 
V s  estimate  of  robot  j’s  state  was  valid.  We  have  that  Uj  <  t  and  tu  =  t. 
Each  robot’s  state  estimate  is  initialized  to  infinity  to  indicate  that  a  valid 
estimate  is  lacking,  except  for  its  own  state  which  is  always  current. 

We  desire  to  communicate  state  estimates  throughout  the  robot  network. 
For  simplicity,  we  use  Time  Division  Multiple  Access  (TDM A) 2  to  divide 
the  data  stream  into  time  slots  of  equal  length,  m.  During  a  time  slot,  one 


1  Although  it  is  easiest  to  think  of  the  space  being  M2  or  M3,  the  strategy  we  describe  is 
equally  useful  with  more  detailed  state  estimates  (e.g.  velocity,  acceleration,  joint  positions, 
state  machine  information,  etc.) 

2  In  this  paper  we  primarily  discuss  implementing  the  proposed  strategy  using  TDM  A; 

however,  many  other  channel  access  methods  are  appropriate  (e.g.  FDMA  or  CDMA). 
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assigned  robot  is  allowed  to  broadcast  over  the  shared  frequency  channel. 
Other  robots  broadcast  one  after  the  other  in  a  predetermined  order.  One 
complete  broadcast  cycle  is  referred  to  as  a  frame. 

To  broadcast  its  own  state  estimate  once  per  frame,  the  robot’s  time  slot 
must  be  long  enough  to  transmit  the  estimate  and  an  associated  time  stamp. 
Such  a  time  slot  is  considered  to  have  length  of  m  =  1.  Clearly  time  slots  of 
unit  length  are  not  sufficient  to  transmit  information  throughout  the  network; 
each  robot  would  only  be  updated  with  the  state  estimate  of  its  neighbors 
on  the  network.  For  multi-hop  networking,  the  robots  need  longer  time  slots 
to  broadcast  the  estimates  of  other  robots. 

One  naive  strategy  is  to  assign  a  time  slot  length  equal  to  the  number 
of  robots,  m  =  n,  so  that  each  robot  can  broadcast  its  entire  list  of  state 
estimates,  thus  creating  a  simple  flooding  scheme.  Robots  that  are  adjacent 
on  the  network  use  this  information  to  update  their  own  list,  retaining  only 
the  most  current  state  estimates.  The  process  is  repeated  for  each  time  slot, 
naturally  propagating  state  estimates  throughout  the  network  without  the 
need  of  a  complicated  routing  protocol. 

Although  simple  to  implement,  this  strategy  is  not  scalable  for  a  large 
number  of  robots.  Consider  the  rate  a  system  can  cycle  through  all  time  slots 
to  complete  one  frame.  This  frame  rate,  77,  gives  insight  into  how  quickly 
state  estimates  are  being  forwarded,  and  therefore  how  confident  distributed 
controllers  can  be  in  using  the  estimates.  For  a  network  of  fixed  baud  rate,  7*5, 
the  maximum  frame  rate3  is  given  by  max (77)  =  r^/mnb^  where  b  is  the  data 
size  of  a  state  estimate  and  its  associated  time  stamp.  For  m  =  n,  increasing 
the  number  of  robots  in  the  system  will  decrease  the  frame  rate  quadratically. 
This  inherent  trade-off  provides  motivation  to  reduce  the  length  of  the  time 
slot;  however,  if  a  robot  cannot  broadcast  all  state  estimates  within  one  time 
slot,  which  estimates  are  considered  more  important  to  broadcast? 


2.2  Importance  Function 

Many  distributed  controllers  are  dependent  on  spatial  relationships  between 
robots.  When  selecting  which  state  estimate  to  broadcast,  the  selection  pro¬ 
cess  should  also  depend  on  these  relationships.  This  makes  sense  because  a 
robot’s  state  is  more  likely  to  be  useful  to  controllers  in  proximity.  However,  it 
cannot  be  considered  useless  to  controllers  that  are  distant  due  to  the  mobile 
nature  of  the  system.  We  propose  that  the  importance  of  robot  i  broadcast¬ 
ing  robot  f  s  state  estimate  is  inversely  proportional  to  the  distance  between 
robot  states. 

Since  the  robots  only  have  access  to  the  state  estimates  they  receive,  a 
distance  estimate  is  used  to  give  the  following  importance  function 

=  d  (pi(t),pj(Uj))~a  (1) 


3  We  are  ignoring  overhead  associated  with  TDM  A  (e.g.  guard  periods,  checksums,  etc.) 


A  Location-Based  Algorithm  for  Multi-hopping  State  Estimates 


5 


where  d(-,  •)  >  0  is  a  distance  function  and  a  G  (0,  oo)  is  a  free  parameter, 
both  of  which  are  selected  for  the  given  distributed  controller.  For  example, 
a  Voronoi  coverage  controller  dependent  on  linear  spatial  separation  may  use 
a  Euclidean  distance  function  with  a  =  1.  This  same  distance  function  is  ap¬ 
propriate  for  a  sensor-based  controller  dependent  on  light  intensity,  although 
a  =  2  may  be  used  since  light  intensity  decays  quadratically  with  distance 
from  the  source.  Conversely,  the  distance  function  does  not  need  to  be  Eu¬ 
clidean  or  even  of  continuous  topology,  such  as  for  truss  climbing  robots  with 
a  finite  configuration  space.  In  any  case,  a  robot  should  consider  its  own  state 
estimate  to  be  the  most  important  to  broadcast.  This  is  reflected  in  the  model 
since  fa  is  infinite  for  any  valid  d(-,  •)  and  a. 


3  Location-Based  Algorithm  for  Broadcasting  States 

We  use  the  importance  function  in  Equation  (1)  to  develop  a  deterministic 
algorithm.  For  a  given  time  slot,  this  algorithm  selects  which  state  estimates 
a  robot  will  broadcast.  We  first  describe  a  probabilistic  approach  to  help 
formulate  the  final  algorithm. 


3.1  Probabilistic  Approach 

Consider  a  robot  that  needs  to  select  m  state  estimates  to  broadcast  during 
its  time  slot.  We  provided  motivation  in  Section  2.2  that  some  selections  are 
more  important  than  others.  However,  the  robot  should  not  systematically 
select  the  state  estimates  associated  with  the  highest  importance;  doing  so 
can  prevent  estimates  from  fully  dispersing  throughout  the  system.  Instead, 
we  propose  that  the  probability  of  robot  i  selecting  the  state  estimate  of 
robot  j  is 


plA(t)  = 


fij  {ty 


2 keMi  fik{t) 


j  e  Mi 


(2) 


where  Mi  is  the  set  of  robot  indices  associated  with  selectable  estimates. 

Prior  to  the  first  selection  for  a  given  time  slot,  Mi  is  the  set  of  all  robot 
indices.  From  the  full  set  the  robot  always  selects  its  own  state  since  it  has 
infinite  importance.  The  robot  then  removes  its  index  from  Mi  to  prevent 
wasting  bandwidth.  Since  Equation  (2)  is  a  valid  probability  mass  function, 
the  robot  can  simply  choose  the  next  state  estimate  at  random  from  the 
corresponding  probability  distribution,  then  remove  the  corresponding  index 
from  Mi.  This  means  estimates  of  closer  robots  are  more  likely  to  be  chosen 
than  ones  that  are  farther  away.  By  repeating  this  process,  the  entire  time 
slot  of  length  m  can  be  filled  in  a  straightforward,  probabilistic  manner. 
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Algorithm  1  Deterministic  Method  for  Selecting  State  Estimates 


n  is  the  number  of  robots  in  the  system  and  m  is  the  time  slot  length. 

Require:  Robot  i  knows  its  state  Pi(t)  and  the  state  estimate  of  other  robots  Pjitij). 
Require:  Robot  i  knows  its  running  counter  [cn, . . . ,  Cin]. 

Mi  <-  n};  Ad  <-  0; 


for  1  to  m  do 


(*) 


f  ij  (0 


YlkeM-i  f ik  h)  ’ 

k  <—  argmaxfcGM.(cifc); 

end  for 
return  Ad 


Vj  €  Mi;  Cij  <-  Cij[l  -  V?  e  Mi] 

Mi  ^  Mi\{k};  Afi  <—  Ad  U  {fc};  cik  1; 


5.2  Deterministically  Selecting  Estimates 


It  is  not  ideal  in  practice  to  probabilistically  select  which  state  estimates  to 
broadcast.  Consecutive  selections  of  a  particular  robot  index  can  be  separated 
by  an  undesirably  long  period  of  time,  especially  concerning  distant  robots. 
By  developing  a  location-based  deterministic  algorithm,  we  can  increase  the 
average  rate  at  which  all  state  estimates  of  a  given  time  stamp  will  propagate 
throughout  a  team.  In  the  deterministic  case,  propagation  time  is  bounded 
above  by  the  longest  path  taken  among  the  estimates.  No  such  bound  ex¬ 
ists  in  the  probabilistic  case,  resulting  in  a  positively  skewed  distribution  of 
propagation  times  and  a  larger  mean. 

We  propose  that  each  robot  maintains  a  list  of  counters,  [cn, _ ,  c*n], 

which  are  initially  set  to  a  value  of  one.  Using  the  probability  mass  function  in 
Equation  (2),  each  counter  represents  the  probability  that  the  corresponding 
index  has  not  been  selected.  Consider  a  robot’s  first  selection,  which  will 
always  be  its  own  index.  The  probability,  (£),  of  selecting  index  i  is  equal 

to  one,  while  all  other  probabilities,  P^.(t)  subject  to  j  7^  i,  are  equal  to 
zero.  This  implies  that  the  counter  cu  is  multiplied  by  [1  —  Pj^.(t)]  =  0, 
or  a  zero  probability  of  not  being  selected,  while  all  other  counters,  Qj,  are 
multiplied  [1  —  Py^  .(t)]  =  1,  or  a  probability  of  one.  By  selecting  the  index 
with  the  lowest  counter  value,  we  are  deterministically  guiding  our  method 
to  behave  according  to  the  probability  distribution  described  by  Equation 
(2).  The  selected  index  (in  this  case  i)  is  removed  from  the  set  Afi,  and  its 
corresponding  counter  ( ca )  is  reset  to  a  value  of  one.  This  process  is  iteratively 
applied  to  completely  fill  a  time  slot  with  m  state  estimates,  with  counters 
maintaining  their  values  between  frames.  The  complete  deterministic  strategy 
of  0(mn )  time  is  given  in  Algorithm  1. 


4  Simulations  and  Experiments 

We  provide  insight  into  the  performance  of  the  location-based  algorithm  in 
three  ways:  we  conducted  Monte  Carlo  simulations  for  100  stationary  robots, 
we  used  real  control  and  wireless  hardware  to  simulate  nine  robots  running  a 
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Fig.  2  This  figure  shows 
the  average  propagation 
time  for  the  location-based 
algorithm  running  on  a 
10  X  10  stationary  robot 
grid.  Averages  were  taken 
over  1000  Monte  Carlo 
simulations.  For  small  sub¬ 
graphs  (i.e.  2x2),  update 
rates  of  state  estimates 
increased  with  decreasing 
time  slot  lengths.  For  larger 
subgraphs,  the  optimal 
length  was  around  m  =  7. 


Length  of  time  slot 

(#  of  broadcasted  estimates  per  time  slot) 


distributed  coverage  algorithm,  and  we  implemented  this  hardware  on  five  fly¬ 
ing  and  four  stationary  robots.  We  first  describe  the  Monte  Carlo  simulations 
used  to  measure  information  propagation  throughout  the  robot  team.  Prop¬ 
agation  time  is  the  main  performance  metric  for  the  algorithm.  This  metric 
depends  on  the  length  of  the  time  slot,  or  in  other  words,  the  number  of  state 
estimates  communicated  during  one  robot  broadcast.  We  compare  these  re¬ 
sults  to  the  case  when  the  time  slot  length  equals  the  number  of  robots,  since 
allowing  robots  to  broadcast  every  state  estimate  is  the  simplest  multi-hop 
scheme.  This  scheme  is  referred  to  as  simple  flooding. 

In  a  MATLAB  environment,  we  simulated  a  team  of  100  stationary  robots 
arranged  in  a  10  x  10  square  grid.  Each  robot,  initialized  knowing  only  its 
own  state  estimate,  was  able  to  receive  broadcasts  from  its  adjacent  neighbors 
along  the  vertical  and  horizontal  directions.  Each  robot  ran  Algorithm  1  in 
distributed  fashion.  Over  1000  Monte  Carlo  simulations  were  executed  for 
time  slots  of  varying  lengths,  with  each  run  having  a  random  order  for  the 
time  slot  assignments.  For  the  2x2, 4x4, 6x6,  and  8x8  subgraphs  centered 
on  the  10  x  10  graph,  we  measured  the  time  it  took  for  all  subgraph  members 
to  exchange  state  estimates. 

Figure  2  plots  average  propagation  time  for  the  Monte  Carlo  simulations. 
For  the  smallest  subgraph  (i.e.  2x2),  state  estimates  propagated  faster  with 
smaller  time  slot  lengths.  This  relationship  makes  sense  since  we  are  maximiz¬ 
ing  the  frame  rate,  thus  increasing  update  rates  for  the  local  state  estimates 
of  highest  importance.  As  the  subgraph  size  increases,  very  small  time  slot 
lengths  become  less  effective  at  propagating  estimates,  especially  between 
robots  at  opposite  sides  of  the  subgraph.  By  using  a  slightly  larger  time 
slot  length,  a  significant  improvement  in  performance  over  simple  flooding  is 
obtained;  propagation  times  for  all  subgraphs  decreased  by  more  than  47% 
using  a  time  slot  length  of  m  =  7.  Analyzing  such  Monte  Carlo  plots  provides 
a  heuristic  technique  for  selecting  an  acceptable  time  slot  length  for  a  given 
control  problem. 

We  then  tested  the  algorithm  in  a  simulated  robot  scenario  using  real  con¬ 
trol  and  wireless  hardware.  We  implemented  a  Voronoi  coverage  controller  [2] 
on  nine  custom  ARM  microcontroller  modules,  each  using  a  900  MHz  xBee 
module  to  wirelessly  broadcast  state  estimates  during  its  assigned  time  slot. 
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Fig.  3  Coverage  costs  are  shown  for  a  nine  robot  system  simulated  on  real  hardware 
running  a  Voronoi  coverage  controller.  The  system  has  a  frame  rate  of  1.7  Hz  when  using 
a  no-hop  scheme  (m  =  1).  The  system  initially  performs  well,  but  its  inability  to  multi¬ 
hop  state  estimates  resulted  in  a  suboptimal  final  configuration.  A  simple  flooding  scheme 
(m  =  9)  improved  steady  state  performance,  however,  the  slow  frame  rate  of  0.2  Hz  caused 
the  system  to  initially  oscillate  in  a  high  cost  configuration.  The  location-based  algorithm 
with  a  time  slot  of  length  m  =  3  performed  the  best  overall  by  combining  fast  update  rates 
with  multi-hop  capabilities.  The  final  Voronoi  configurations  for  the  algorithm  and  no-hop 
simulations  are  also  shown. 


Each  control  module  simulated  the  dynamics  of  a  flying  robot,  creating  a 
virtual  distributed  robot  team.  In  addition,  a  communication  range  was  im¬ 
plemented  such  that  packets  from  “out-of-range”  robots  were  automatically 
dropped.  We  investigate  the  performance  of  the  location-based  algorithm  in  a 
simple  scenario  where  nine  virtual  robots  were  tasked  to  cover  a  square  area. 
For  this  scenario  the  optimal  configuration  is  for  the  robots  to  be  arranged 
in  a  3  x  3  square  grid. 

For  the  location-based  algorithm,  a  time  slot  length  of  m  =  3  was  selected 
using  the  Monte  Carlo  technique  previously  discussed.  We  also  selected  the 
Euclidean  distance  function  with  a  =  1  given  that  the  Voronoi  coverage  con¬ 
troller  is  linearly  dependent  on  such  distance.  Each  state  estimate  for  the 
virtual  flying  robot  is  constructed  of  six  32-bit  integers  (robot  identification, 
time  stamp,  latitude,  longitude,  altitude,  and  yaw),  resulting  in  a  data  size 
of  192  bits.  Given  that  the  wireless  hardware  could  reliably  operate  at  3000 
baud,  the  resulting  frame  rate  was  about  0.6  Hz.  For  comparison,  the  simple 
flooding  (m  =  9)  and  no- hop  (m  =  1)  schemes  ran  at  about  0.2  Hz  and 
1.7  Hz,  respectively.  Figure  3  shows  the  resulting  coverage  cost  profiles  from 
these  simulations.  The  location-based  algorithm  had  better  initial  perfor¬ 
mance  than  the  simple  flooding  scheme  and  better  steady  state  performance 
than  the  no- hop  scheme.  The  final  Voronoi  configurations  for  the  algorithm 
and  no-hop  simulations  are  also  shown. 

Finally,  we  implemented  the  location-based  algorithm  on  five  AscTec  Hum¬ 
mingbird  quad-rotor  flying  robots  [3]  and  four  stationary  robots,  thus  creating 
a  nine  robot  team.  Each  flying  robot  was  equipped  with  an  AscTec  AutoPilot 
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Fig.  4  An  example  mo¬ 
bile  ad  hoc  network  graph 
from  the  quad-rotor  flying 
robot  experiment  is  plot¬ 
ted  in  Google  Earth.  For 
this  nine  robot  system,  the 
location-based  algorithm 
routes  state  estimates 
through  the  entire  team. 
The  bounded  environment 
from  the  downward  facing 
camera  coverage  problem  is 
also  shown. 


Fig.  5  This  plot  shows 
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board  capable  of  capturing  GPS,  altitude,  and  yaw  positions.  The  previously 
described  control  and  wireless  modules  were  installed  on  these  AutoPilot 
boards.  In  addition,  four  separate  modules  were  deployed  at  fixed  locations 
to  represent  the  stationary  robots. 

This  experimental  setup  was  designed  to  run  a  downward  facing  camera 
coverage  controller  for  hovering  robots  [6].  Since  this  controller  has  a  spa¬ 
tial  dependence  similar  to  the  Voronoi  coverage  controller,  the  same  time 
slot  length,  distance  function,  and  a  were  used.  Figure  4  shows  the  network 
topology  of  a  random  deployment  configuration  prior  to  starting  the  coverage 
controller.  Here  we  limited  the  communication  range  to  30  meters;  in  previ¬ 
ous  experiments  we  were  able  to  produce  links  in  excess  of  100  meters.  Figure 
5  plots  the  time  stamp  of  the  most  current  state  estimates  as  received  by  the 
stationary  robot  beta,  which  can  be  considered  the  “worst  case”  receiver  since 
it  is  the  most  remote  robot  in  the  team.  As  previously  discussed,  beta’s  own 
state  estimate  is  always  considered  to  be  current.  Estimates  of  other  robots 
are  updated  as  they  are  received  by  team  broadcasts,  whether  directly  from 
the  originating  robot  or  indirectly  in  a  multi-hop  fashion.  Since  closer  robots 
are  considered  more  important  in  the  algorithm  formulation,  this  results  in 
their  state  estimates  being  more  current  with  more  frequent  updates. 
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5  Conclusion 

In  this  paper  we  presented  a  location-based  strategy  for  exchanging  state  es¬ 
timates  in  a  distributed  robot  team.  We  developed  a  deterministic  algorithm 
that,  based  on  an  importance  function,  broadcasts  estimates  of  nearby  robots 
more  frequently  than  distant  ones.  Simulations  using  real  control  and  wireless 
hardware  show  that  the  algorithm  outperforms  simple  flooding  schemes  for 
large  robot  networks. 

Our  experiments  consisting  of  five  Asctec  Hummingbird  quad-rotor  fly¬ 
ing  robots  among  four  stationary  robots  showed  the  successful  exchange  of 
state  estimates  in  a  multi-hop  fashion.  Using  the  location-based  algorithm, 
we  successfully  ran  the  coverage  controller  from  [6]  on  three  flying  robots 
with  downward  facing  cameras.  Coverage  results  for  5+  flying  robots  will  be 
presented  in  future  publications. 

We  desire  to  further  develop  this  work  to  exploit  the  spatial  reuse  of  time 
slots  for  robots  separated  by  multiple  hops.  This  direction  allows  for  virtually 
infinite  team  sizes  and  spatial  coverage. 
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Abstract  Maintaining  accurate  localization  of  an  autonomous  underwater  vehicle 
(AUV)  is  difficult  because  electronic  signals  such  as  GPS  are  highly  attenuated 
by  water  making  established  land-based  localization  systems,  such  as  GPS,  useless 
underwater.  Instead  we  propose  an  alternative  approach  which  integrates  position 
information  of  other  vehicles  to  reduce  the  error  and  uncertainty  of  the  on-board 
position  estimates  of  the  AUV.  This  approach  uses  the  WHOI  Acoustic  Modem  to 
exchange  vehicle  localization  estimates  —  albeit  at  low  transmission  rates  —  while 
simultaneously  estimating  inter- vehicle  range.  The  performance  capabilities  of  the 
system  were  tested  using  Oceanserver’s  Iver2  and  the  MIT  Scout  kayaks. 


1  Introduction 

Localization  or  navigation  of  vehicles  using  only  onboard  local  sensors,  such  as  a 
Doppler  Velocity  Logger  (DVL)  or  Inertial  Measurement  Unit  (IMU),  are  certain  to 
experience  accumulated  positioning  error.  One  can,  of  course,  utilize  more  precise 
sensors  to  reduce  the  rate  of  accumulated  error  —  DVL  units  with  error  accumula¬ 
tion  rates  as  low  as  0.2%  are  commercially  available.  However  this  approach  may 
not  be  satisfactory  due  to  practical,  power  or  financial  limitations. 

Regardless  of  the  platform  used,  the  accumulation  of  error  and  uncertainty  is 
simply  slowed,  rather  than  bounded.  The  result  of  this  is  that  an  AUV  surveying  the 
ocean  floor  or  a  land  robot  building  a  street  map  must  be  halted  on  occasion  so  as  to 
reset  the  position  uncertainty  —  either  by  surfacing  for  a  GPS  fix  or  by  repositioning 
at  a  known  location.  This  procedure  wastes  both  energy  and  time,  requires  a  human 
interface  and  may  be  unacceptable  in  many  operating  environments. 

The  standard  approach  for  bounding  error  underwater  is  Long  Baseline  (LBL). 
Two  or  more  beacons  are  deployed  at  known  locations  —  either  as  buoys  on  the 
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water  surface  or  moored  on  the  seabed.  The  AUV  transmits  an  acoustic  query  to 
the  beacons  which  reply  in  a  manner  which  allows  the  AUV  to  estimate  the  bea¬ 
con/ AUV  range  and  to  then  improve  its  own  position  estimate.  Recent  improve¬ 
ments  to  this  system  have  removed  the  need  for  round-trip  timing  (Synced  LBL) 
and  also  allowed  for  estimation  of  both  range  and  angle  using  an  array  of  receiving 
sensors  (USBL). 

While  these  technologies  are  now  all  commercially  available,  the  mobility  of  the 
AUV  is  restricted  as  typical  coverage  is  limited  to  an  area  within  a  few  kilometers 
of  the  beacon.  To  relax  this  restriction  an  alternative  approach  considers  a  system  in 
which  a  surface  vehicle  (with  access  to  GPS)  or  a  submerged  vehicle  (with  accurate 
dead  reckoning  instrumentation)  communicates  with  a  fleet  of  much  less  accurately 
localized  vehicles  so  as  to  improve  the  positioning  of  the  latter.  One  example  of  this 
approach  is  the  Moving  Long  Base  Line  (MLBL)  navigation  proposed  by  Vaganay 
et  al.  [9],  in  which  typically  two  surface  vehicles  serve  as  mobile  beacons  for  one 
or  more  AUVs.  Other  related  recent  research  has  been  performed  by  Bahr  et  al.  [1], 
Eustice  et  al.  [4]  and  Maczka  et  al.  [5].  It  should  also  be  recognized  that  multi- AUV 
navigation  falls  within  the  wider  problem  of  multi-robot  cooperative  localization, 
see  [6]  for  a  more  general  introduction  to  the  field. 

In  this  paper,  we  describe  experiments  that  extend  the  MLBL  approach  to  situ¬ 
ations  in  which  a  single  surface  vehicle  is  used  to  estimate  the  position  of  a  sub¬ 
merged  AUV  using  range-only  measurements.  In  Section  2  the  basic  framework  of 
this  technique  is  discussed.  Our  algorithm  is  outlined  in  Section  3  followed  by  a 
number  of  modifications  which  improve  performance.  Section  4  presents  the  results 
of  a  combination  of  simulation  and  realistic  experiments  to  illustrate  the  concept. 
Finally  conclusions  drawn  from  the  experiments  and  the  directions  of  future  work 
are  presented  in  Section  5. 


2  Cooperative  Localization  Under  Water 

This  paper  retains  the  framework  for  underwater  localization  previously  introduced 
in  [1]  and  also  used  in  [4].  We  shall  assume  there  to  be  one  surface  vehicle  providing 
the  submerged  fleet  of  vehicles  with  position  information  while  perhaps  operating 
as  a  communications  moderator  —  in  the  dual  role  of  a  Communications  and  Nav¬ 
igation  Aid  (CNA).  Each  of  the  autonomous  underwater  vehicles  maintains  a  dead 
reckoning  filter,  drawing  upon  measurements  of  velocity,  heading  and  depth.  Fi¬ 
nally,  communication  through  the  water  channel  is  possible  using  the  WHOI  Acous¬ 
tic  Modem  —  at  transmission  rates  of  the  order  of  32  bytes  per  10  seconds  —  in  a 
process  which  also  yields  a  time-of-flight  measurement  which  can  be  used  to  esti¬ 
mate  the  inter- vehicle  range. 

There  are  a  number  of  methods  which  could  be  used  to  integrate  the  received 
position  information.  Our  earlier  work,  [1],  proposed  an  algorithm  which  utilized 
the  on-board  dead  reckoning  estimate  of  the  AUV  and  a  pair  of  CNA  range  estimates 
to  form  a  complete  estimate  of  the  AUV  state  vector. 
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The  seabed,  the  water  surface  and  deep  sea  thermoclines  within  the  water  body 
have  the  ability  to  cause  significant  multi-path  signal  interference  and  the  receipt 
of  a  substantial  amount  of  infeasible  outlier  measurements.  A  typical  dataset  was 
illustrated  for  a  regular  Long  Baseline  systems  in  [8].  For  these  reasons  it  would  be 
reasonable  to  assume  that  the  received  measurement  set  obtained  from  the  WHOI 
modem  would  contain  substantial  multi-modality,  thereby  motivating  this  approach. 

However  the  advanced  processing  within  the  WHOI  modem  decoder  has  the  abil¬ 
ity  to  suppress  the  bulk  of  these  effects,  such  that  the  received  measurements  de¬ 
coded  by  the  modem  contain  only  a  moderate  amount  of  noise.  For  this  reason  the 
proposed  approach  instead  uses  an  implementation  of  the  Extended  Kalman  Filter. 

A  particle  filtering  approach  [3]  could  also  have  been  considered  as  this  would 
have  more  accurately  incorporated  the  non-linearity  of  the  correction  step,  however 
because  we  will  maintain  full  control  of  the  CNA’s  motion  this  issue  can  broadly  be 
avoided. 

Previous  proof-of-concept  experiments  illustrated  that  the  range  variance  is  broadly 
independent  of  range  itself,  however  detailed  examination  of  this  was  not  carried  out 
[2].  The  modem  transducer  was  then  directly  clamped  to  the  underside  of  the  kayak. 
Our  more  recent  experiments  have  instead  hung  the  transducer  2-3  meters  below 
the  kayak  hull.  We  expect  less  noise  interference  from  the  kayak  motor  and  less 
reflections  from  the  water  surface  in  this  configuration. 

Figure  1  illustrates  WHOI  modem  range  data  plotted  versus  GPS -derived  ‘ground 
truth’,  as  measured  in  the  Charles  River  adjacent  to  MIT  recently.  Because  the 
ground  truth  distance  between  the  two  vehicles  was  determined  using  imprecise 
GPS  measurements,  it  is  difficult  to  precisely  estimate  the  distribution  of  the  range 
measurements.  Other  issues,  such  as  the  position  of  the  GPS  sensor  relative  to  the 
modem  on  the  kayak  must  also  be  recognized.  In  the  absence  of  precise  ground 
truth,  we  estimate  the  range  variance  to  be  between  4-8m. 


3  Single  Surface  Craft  Cooperative  Navigation 

The  configuration  we  will  consider  in  this  work  will  be  of  a  single  CNA  supporting 
N  underwater  vehicles1 .  Each  AUV  will  maintain  an  estimate  of  its  own  position  and 
uncertainty.  This  estimate  will  be  propagated  using  the  usual  Kalman  prediction  step 
so  as  to  integrate  heading,  forward  and  starboard  velocity  measurements. 

As  mentioned  above,  this  estimate  will  be  corrected  using  range  and  position  in¬ 
formation  relative  to  an  CNA  using  the  WHOI  acoustic  modem.  At  present  the  32 
byte  packet  transmitted  from  the  CNA  shall  contain  latitude,  longitude,  depth  and 
heading  as  well  as  a  UNIX  time-stamp.  Transmission  of  a  packet  consists  of  two 
stages:  first  a  mini  packet  is  transmitted  to  initiate  the  communication  sequence. 
The  inter- vehicle  range  can  be  estimated  using  this  mini  packet.  Following  this,  the 
information  packet  is  transmitted  in  a  process  which  lasts  approximately  5-6  sec- 


1  Subsequent  research  will  aim  to  relax  the  necessity  of  a  dedicated  surface  vehicle 
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onds.  In  all,  it  is  prudent  to  reserve  10  seconds  per  transmission.  Simularly  the  AUV 
will  transmit  a  message  containing  its  own  position  estimate  as  well  the  associated 
covariance  matrix  which  can  be  used  to  help  the  CNA  plan  its  own  supporting  mo¬ 
tion  —  also  requiring  10  seconds  per  transmission. 

It  is  envisaged  that  the  MLBL  will  be  integrated  within  a  multi- AUV  setup  in 
which  use  of  the  communication  channel  is  shared  between  many  communicating 
processes.  As  a  result  the  transmission  rate  of  a  position/range  pair  is  likely  to  be 
substantially  below  one  measurement  per  10  seconds.  Furthermore  only  a  portion 
of  transmitted  messages  will  actually  be  received.  For  these  reasons  it  is  prudent 
to  optimize  the  location  from  which  the  ASC  transmits  so  as  to  maximize  the  ben¬ 
efit  achieved  from  the  correction  step.  Although  a  basic  zig-zag  motion  plan  was 
adopted  in  this  work,  future  work  will  consider  more  elaborate  motion  planning  for 
the  CNA. 


Fig.  1  Analysis  of  range  estimates  derived  from  the  WHOI  Modem.  Upper  Left:  Comparison 
of  modem  range  estimate  (red  dots)  and  range  derived  from  GPS  ‘ground  truth’  (blue  crosses) 
for  each  fully  successful  10  second  transmission  period.  Lower  Left:  Illustration  of  the  frequency 
of  successful  transmissions.  Category  0  represents  an  entirely  failed  transmission;  Category  1: 
successful  range  transmission;  Category  2:  successful  range  and  packet  transmission.  Category 
2  corresponds  to  the  modem  ranges  in  upper  left  plot.  Right:  Histogram  of  range  error  (using 
estimated  range  versus  GPS  ‘ground  truth’  range),  also  illustrated  is  a  normal  distribution  fitted  to 
the  data  (red,  r  =  0.66m,  or  =  7.5m)  and  the  normal  distribution  used  in  the  experiments  in  Section 
4  with  (cyan,  r  =s  0m,  or  =  5m).  This  range  data  corresponds  to  Experiment  1. 
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As  illustrated  in  Figure  1 ,  a  significant  proportion  of  the  (range)  mini  packets  are 
received  without  the  information  packet  —  meaning  that  the  usual  correction  step 
cannot  be  made2 

By  linearly  predicting  the  CNA  position  using  previous  position  estimates,  an 
estimate  of  the  CNA  at  this  time  can  be  formed.  This  estimate  can  then  be  used 
with  the  previously  orphaned  range  measurement  to  allow  another  correction  step  to 
occur.  While  post  processing  of  the  data  from  the  experiments  presented  in  Section 
4  in  this  manner  reduced  the  average  error  by  approximately  one  meter,  in  future  we 
propose  to  introduce  redundancy  into  the  transmitted  messages  so  as  to  avoid  this 
scenario.  See  Section  5  for  more  discussion. 


3.2  Online  Compass  Bias  Correction 

A  Bayesian  filter  -  such  as  a  Kalman  filter  or  particle  filter  -  assumes  that  mea¬ 
surements  are  formed  using  unbiased  estimators.  Heading  is  however  particularly 
difficult  measurement  to  estimate  properly.  Compass  accuracy  can  be  effected  by 
the  characteristics  of  the  local  region,  the  magnetism  of  the  vehicle  itself  and  mag¬ 
netic  declination.  It  is  particularly  severe  for  imprecise  sensors  used  aboard  the  CNA 
platform.  As  a  result,  the  compass  used  in  the  experiments  presented  in  Section  4 
is  a  dominant  source  of  navigation  error.  Typically  compass  bias  is  corrected  using 
a  calibration  process  which  can  be  both  complex  and  time  consuming.  In  this  sce¬ 
nario,  the  EKF  corrections  garnered  using  the  CNA  range  and  position  can  be  used 
to  estimate  the  compass  bias  and  to  remove  its  effect. 

Between  successive  corrections  of  the  EKF,  the  filter  will  be  predicted  according 
to  the  dynamical  model.  The  frequency  of  the  prediction  step  will  be  much  higher 
than  the  correction  step.  The  distance  between  the  posterior  estimate  of  a  correction 
step  at  time  k\  and  the  predicted  position  at  time  k 2  is  the  estimated  relative  distance 


Fig.  2  Compass  Bias  Correction  Example:  MLBL  position  estimate  (blue)  is  corrected  towards 
the  ground  truth  (red)  in  a  consistent  direction.  The  angular  correction  of  the  4  correction  steps, 
@i:4,  can  be  used  to  form  an  estimate  of  the  bias  angle,  which  is  then  removed.  Note  that  multiple 
iterations  of  the  prediction  step  take  place  between  each  correction  step. 


2  For  a  typical  mission  in  the  open  ocean  inter- vehicle  ranges  of  the  order  of  1km  are  expected, 
making  this  an  even  more  significant  issue. 
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traveled  in  that  time 
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where  =  [x^ , }  represents  the  state  vector  at  time  k—1.  The  CNA  position  and 
range  measurement  are  then  integrated  to  correct  the  posterior  position  estimate 


X^2  X^1 


(2) 


If  the  sensors  contributing  to  the  measurement,  z^2 ,  are  unbiased  the  expected  value 
of  the  update  will  be  zero.  However  if  there  exists  a  compass  bias,  the  EKF  will  act 
to  correct  the  filter  in  the  direction  opposite  to  the  bias 


(3) 


Figure  2  illustrates  the  issue  for  a  sequence  of  MLBL  corrections  for  a  biased 
compass.  It  can  be  seen  that  the  angle  of  the  correction  is  consistently  in  the  hypoth¬ 
esized  bias  direction.  However  as  the  CNA  consistently  maneuvers  relative  to  the 
AUV,  a  closed  form  expression  for  the  bias  angle  cannot  be  formed. 

Instead  we  will  propose  to  successively  estimate  the  bias  until  this  effect  is 
removed.  Consider  the  net  angular  correction  set  of  N  successive  corrections, 
(%_n+  i ,...%).  We  assume  that  the  median  of  this  set,  given  by  %,  will  be  in  the 
direction  of,  but  less  than,  the  bias  angle,  i.e.  [0  ^  <  0bias]. 

This  value  is  assumed  to  be  an  initial  estimate  of  the  bias  and  used  to  correct  the 
heading  estimate  subsequently.  After  the  next  N  corrections,  any  remaining  bias  is 
again  estimated  and  added  to  the  running  bias  estimate.  Eventually  the  bias  will  be 
assumed  to  be  known  and  can  be  removed. 


4  Experiments 

A  number  of  experiments  were  carried  out  in  the  Charles  River,  adjacent  to  MIT, 
to  demonstrate  the  concept  of  Moving  Long  Baseline  using  the  Surface  Crafts  for 
Oceanographic  and  Undersea  Testing  (SCOUT)  kayaks  designed  in  MIT  and  the 
low-cost  Iver2  from  Oceanserver  (see  Figure  4).  Each  of  the  kayaks  was  equipped 
with  a  WHOI  modem,  a  compass  and  a  GPS  sensor  while  the  Iver’s  basic  sensor 
suite  consisted  of  only  a  compass  and  a  WHOI  modem.  The  Iver2’s  only  velocity 
estimate  was  a  constant  value  of  1.028  m/s  (2  knots)  specified  by  the  mission  plan. 

Each  vehicle’s  onboard  computer  ran  an  implementation  of  the  MOOS  software 
platform  [7] .  Maintaining  an  accurately  synchronized  clock  is  essential  for  the  esti¬ 
mation  of  inter  vehicle  ranges;  to  do  so  the  Iver2  utilized  a  precisely  synchronized 
timing  board  developed  by  Eustice  et  al.  [4]  while  the  SCOUT  kayaks  used  the 
Plus-Per-Second  (PPS)  contained  within  its  received  GPS  data  messages. 

Experiment  1:  A  single  SCOUT  kayak  designated  as  the  ‘AUV’  completed  a 
survey-type  mission  while  another  kayak  maintained  a  zig-zag  pattern  behind  the 
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‘AUV’  —  taking  on  the  CNA  role.  The  onboard  GPS  sensor  was  used  to  determine 
the  ground  truth  position  as  well  as  to  simulate  forward  and  starboard  velocities. 
Measurements  drawn  from  the  CNA  transmissions  were  used  by  the  ‘AUV’  to  re¬ 
duce  its  uncertainty.  The  designated  ‘AUV’  carried  out  1.5  circuits  of  a  rectangle, 
covering  approximately  1800  metres  in  total  over  a  period  of  37  minutes. 

Note  the  large  increase  in  the  error  of  the  position  measurement  between  22- 
26  minutes.  This  was  caused  by  a  combination  of  poor  CNA  position  estimation 
(caused  by  visibility  of  just  4  GPS  satellites)  and  the  CNA  moving  close,  yet  parallel, 
to  the  AUV.  It  is  envisaged  that  this  could  have  been  avoided  with  the  use  of  a  more 
accurate  GPS  unit  or  by  forbidding  the  CNA  from  taking  such  a  trajectory. 

The  following  are  a  number  of  metrics  for  this  test:  mean  error  12.5m,  mean 
‘AUV’  velocity  0.82m/s,  mean  CNA  velocity  1.08m/s.  There  were  205  transmis¬ 
sions  of  which  130  were  fully  successful,  63  resulted  in  a  failed  packet  transmission 
but  a  successful  range  estimate  while  12  resulted  in  complete  transmission  failure. 
The  algorithm  can  be  seen  to  bound  the  error  of  the  position  estimate  to  approxi¬ 
mately  20m. 

Experiment  2:  In  a  second  fully  realistic  experiment,  the  Iver2  carried  out  a  pre¬ 
defined  ‘lawnmower’  pattern  running  at  a  depth  of  2.4m  while  again  the  SCOUT 
kayak  supported  by  transmitting  its  GPS  position  to  the  AUV  via  the  WHOI  mo¬ 
dem.  In  addition  the  Iver2  transmitted  its  own  MLBL  position  estimate,  which  was 
received  by  the  CNA  and  used  to  plan  locations  from  which  to  transmit. 

Figure  6(b)  illustrates  the  path  taken  by  the  vehicles.  The  test  lasted  28  minutes 
and  in  total  the  Iver2  travelled  2  km.  The  AUV  surfaced  twice  as  a  safety  precaution. 
After  9  minutes  the  AUV  first  surfaced  and  received  a  GPS  fix  at  (-201.6  -242.0)  as 
shown  as  a  red  cross,  at  that  time  the  front  seat  filter  estimated  a  position  of  (-258.7,- 
276.5)  while  the  MLBL  filter  estimate  (-208.9,-238.1)  giving  an  error  of  66.7m 
and  8.3m  error  respectively.  When  the  Iver  surfaced  for  the  second  time  (after  19 
minutes),  the  corresponding  errors  were  53.7m  and  14.1m.  As  well  as  estimating  the 
AUV  position  with  error,  both  of  these  MLBL  filter  estimates  were  within  a  95% 
confidence  interval  upon  surface.  Note  that  after  each  surface  the  AUV  transited 


Fig.  3  Error  (left)  and  95%  confidence  (right)  for  the  MLBL  algorithm  (blue)  and  the  dead  reckon¬ 
ing  alone  (green)  for  Experiment  1  where  we  have  defined  %90  confidence  in  terms  of  the  largest 
eigenvalue  of  the  covariance  matrix. 
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from  the  GPS  location  back  to  its  planned  location  on  the  mission  path  before  diving 
and  continuing  the  mission. 

It  should  be  mentioned  that  between  4-8  and  12-1 8  minutes  no  packets  were  suc¬ 
cessfully  received  by  the  AUV  and  as  a  result  no  MLBL  corrections  were  possible 
(See  Figure  5).  This  can  be  attributed  to  a  number  of  factors 

•  The  CNA  was  positioned  behind  the  AUV  and  as  a  result  churned  water  from  the 
AUV  propeller  is  likely  to  have  reduced  communication  capabilities. 

•  With  each  failed  transmission  the  AUV/CNA  range  grew  until  about  225m  which 
is  considered  long  for  this  experimental  river  environment 3 . 

•  The  presence  of  a  tourist  cruise  ship  nearby. 

In  future  tests,  precautions  will  be  taken  to  avoid  these  issues. 


Fig.  4  Vehicles  Used:  OceanServer  Iver2  (left)  and  the  MIT  Scout  kayak  (right) 


Fig.  5  Results  for  Experiment  2.  Left:  Modem  range  estimates  with  successful  packet  transmission 
(red  dots)  and  modem  range  estimates  but  failed  packet  transmission  (black  crosses).  Right:  95% 
confidence  for  the  MLBL  algoritm  (blue)  and  the  dead  reckoning  along  (green).  Note  the  two  long 
portions  of  the  run  in  which  ranges  were  determined  but  no  packet  was  successfully  transmitted 
and  the  resultant  growth  in  position  uncertainty. 


3  Note  that  the  maximum  range  of  the  WHOI  modem  in  the  open  ocean  is  estimated  to  be  of  the 
order  of  4-5  times  greater  than  the  river  environment. 
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The  concept  of  a  single  surface  vehicle  supporting  the  localization  of  an  AUV  has 
been  outlined.  Full  experimental  results  with  a  single  CNA  supporting  an  Iver2  were 
presented.  The  resultant  position  estimate  was  shown  to  be  substantially  more  accu¬ 
rate  than  the  vehicle’s  own  onboard  navigation  filter.  Future  work  will  focus  on  ex¬ 
tending  this  framework  for  testing  with  three  Iver2  vehicles  and  eventually  towards 
the  scenario  in  which  a  set  of  heterogeneous  vehicles  are  continuously  submerged 
with  only  a  single  vehicle  occasionally  surfacing  to  access  the  GPS. 

Secondly,  the  performance  of  the  algorithm  is  directly  determined  by  the  quality 
and  frequency  of  received  measurements.  We  will  consider  the  optimization  of  the 
transmitted  messages  (and  the  re-transmission  of  failed  data)  so  as  to  reduce  the 
proportion  of  useless  or  partial  messages  received  by  the  AUV.  In  this  work  the  path 
taken  by  the  CNA  was  an  arbitary  zig-zag  behind  the  AUV.  Motion  planning  of  the 
CNA’s  path  —  so  as  to  transmit  messages  from  the  most  adventagous  location  — 
will  also  be  carried  out  in  future. 


References 

1.  Alex  Bahr  and  John  J.  Leonard.  Cooperative  localization  for  autonomous  underwater  vehicles. 
In  Proceedings  of  the  10th  International  Symposium  on  Experimental  Robotics  (ISER),  Rio  de 
Janeiro,  Brasil,  July  2006. 

2.  Joseph  Curcio,  John  J.  Leonard,  Jerome  Vaganay,  Andrew  Patrikalakis,  Alexander  Bahr,  David 
Battle,  Henrik  Schmidt,  and  Matthew  Grund.  Experiments  in  moving  baseline  navigation  using 
autonomous  surface  craft.  In  Proceedings  of  MTS/IEEE  Oceans ,  volume  1,  pages  730-735, 
September  2005. 

3.  Amaud  Doucet,  Nando  de  Freitas,  and  Neil  Gordon,  editors.  Sequential  Monte  Carlo  methods 
in  practice.  Springer- Verlag,  2000. 

4.  Ryan  M.  Eustice,  Louis  L.  Whitcomb,  Hanumant  Singh,  and  Matthew  Grund.  Experimental 
results  in  synchronous-clock  one-way-travel-time  acoustic  navigation  for  autonomous  under¬ 
water  vehicles.  In  IEEE  International  Conference  on  Robotics  and  Automation  (ICRA),  Rome, 
Italy,  April  2007. 

5.  Darren  K.  Maczka,  Aditya  S.  Gadre,  and  Daniel  J.  Stilwell.  Implementation  of  a  cooperative 
navigation  algorithm  on  a  platoon  of  autonomous  underwater  vehicles.  Oceans  2007,  pages 
1-6,  2007. 

6.  Anastasios  I.  Mourikis  and  Stergios  I.  Roumeliotis.  Performance  analysis  of  multirobot  coop¬ 
erative  localization.  IEEE  Transactions  on  Robotics,  22(4): 666-681,  2006. 

7.  Paul  M.  Newman.  MOOS  -  a  mission  oriented  operating  suite.  Technical  Report  Tech.  Rep. 
OE2003-07,  2003. 

8.  Edwin  Olson,  John  J.  Leonard,  and  Seth  Teller.  Robust  range-only  beacon  localization.  IEEE 
Journal  of  Oceanic  Engineering,  3 1(4):  949-958,  October  2006. 

9.  Jerome  Vaganay,  John  J.  Leonard,  Joseph  A.  Curcio,  and  J.  Scott  Willcox.  Experimental  vali¬ 
dation  of  the  moving  long  base  line  navigation  concept,  pages  59-65,  June  2004. 


10 


Maurice  F.  Fallon,  Georgios  Papadopoulos  and  John  J.  Leonard 


(a) 


(b) 


Fig.  6  Paths  taken  by  the  AUV  and  CNA  during  Experiment  1  (upper)  and  2  (lower),  see  Section 
4  for  more  details.  CNA  measurements  were  transmitted  from  the  black  dots.  Note  that  the  final 
500m  of  Experiment  1  has  been  omitted  as  it  overlaps  with  what  is  shown. 
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Abstract  Exploration  of  an  unknown  environment  is  a  fundamental  concern  in 
mobile  robotics.  This  paper  presents  an  approach  for  cooperative  multi-robot  ex¬ 
ploration,  fire  searching  and  mapping  in  an  unknown  environment.  The  proposed 
approach  aims  to  minimize  the  overall  exploration  time,  making  it  possible  to  lo¬ 
cate  fire  sources  in  an  efficient  way.  In  order  to  achieve  this  goal,  the  robots  should 
cooperate  in  an  effective  way,  so  they  can  individually  and  simultaneously  explore 
different  areas  of  the  environment  while  they  identify  fire  sources.  The  proposed 
approach  employs  a  decentralized  frontier  based  exploration  method  which  evalu¬ 
ates  the  cost-gain  ratio  to  navigate  to  target  way -points.  The  target  way-points  are 
obtained  by  an  A*  search  variant  algorithm.  The  potential  field  method  is  used  to 
control  the  robots’  motion  while  avoiding  obstacles.  When  a  robot  detects  a  fire, 
it  estimates  the  flame’s  position  by  triangulation.  The  communication  between  the 
robots  is  done  in  a  decentralized  control  manner  where  they  share  the  necessary 
data  to  generate  the  map  of  the  environment  and  to  perform  cooperative  actions  in 
a  behavioural  decision  making  way.  This  paper  presents  simulation  and  experimen¬ 
tal  results  of  the  proposed  exploration  and  fire  search  method  and  concludes  with  a 
discussion  of  the  obtained  results  and  future  improvements. 

1  INTRODUCTION 

Search  operations  inside  buildings,  caves,  tunnels  and  mines  are  sometimes  ex¬ 
tremely  dangerous  activities.  The  use  of  autonomous  robots  to  perform  such  tasks 
in  complex  environments  will  reduce  the  risk  of  these  missions.  Exploration  of  an 
unknown  environment  is  a  fundamental  issue  in  mobile  robotics.  As  autonomous 
exploration  and  map  building  becomes  increasingly  robust  with  a  single  robot,  the 
next  stage  is  to  extend  these  techniques  to  teams  of  robots.  Using  multiple  robot  sys¬ 
tems  may  potentially  provide  several  advantages  over  single  robot  systems,  namely 
speed,  accuracy,  and  fault  tolerance  [1],  [2],  [3]  and  [4]  .  Nowadays,  swarm  based 
exploration  and  mapping  where  the  robots  can  be  smoothly  added  or  removed  to  the 
operation  is  an  area  with  increasing  interests  to  the  robotics  community  [1]. 

This  study  is  a  part  of  a  European  project  named  Guardians1.  The  Guardians  are 
a  swarm  of  autonomous  robots  applied  to  navigate  and  search  an  urban  environ¬ 
ment.  The  project’s  central  example  is  search  and  rescue  in  an  industrial  warehouse 
in  smoke,  as  proposed  by  the  Fire  and  Rescue  Service  of  South  Yorkshire.  The  job 
is  time  consuming  and  dangerous;  toxins  may  be  released  and  human  senses  can  be 

This  work  was  partially  supported  by  European  project  GUARDIANS  contract  FP6-IST-045269  as 
well  as  by  the  Portuguese  Foundation  for  Science  and  Technology  contract  SFRH/B D/45740/2008. 

{ali,  jgnunes, lino, adealmeida}@isr .uc.pt 

1  http://www.guardians-project.eu 
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severely  impaired.  They  get  disoriented  and  may  get  lost.  The  robots  warn  for  toxic 
chemicals,  provide  and  maintain  mobile  communication  links,  infer  localization  in¬ 
formation  and  assist  in  searching.  Map  exploration  and  fire  source  detection  are  the 
topics  in  this  paper. 

The  problem  of  coordination  and  control  of  multiple  robots  for  mapping  and 
exploration  has  been  almost  addressed  through  several  research  approaches.  Most 
approaches  rely  on  centralized  control  to  direct  each  vehicle  in  the  swarm.  This 
centralized  approach  has  been  popular  in  the  robotics  community,  because  it  allows 
near  optimal  behaviours  in  well  understood  environments.  However,  its  performance 
decreases  in  new  unidentified  environments.  Yamauchi  [5]  proposed  a  distributed 
method  for  multi-robot  exploration,  yielding  a  robust  solution  even  with  the  loss  of 
one  or  more  vehicles  in  the  swarm.  A  key  aspect  of  this  approach  involves  sharing 
map  information  among  the  robotic  agents  so  they  execute  their  own  exploration 
strategy,  independently  of  all  other  agents.  While  this  technique  effectively  decen¬ 
tralizes  control,  exchange  of  map  information  is  not  enough  to  prevent  inefficient 
cooperative  behaviours.  This  approach  also  required  known  starting  positions  and 
failed  to  provide  a  robust  mechanism  for  map  merging  [6]. 

Simultaneous  localization  and  mapping  (SLAM)  has  been  a  topic  of  much  inter¬ 
est  because  it  provides  an  autonomous  vehicle  with  the  ability  to  discern  and  rep¬ 
resent  its  location  in  a  feature  rich  environment.  Some  of  the  statistical  techniques 
used  in  SLAM  include  Kalman  filters,  particle  filters  (Monte  Carlo  methods)  and 
scan  matching  of  range  data.  But  if  there  is  a  local  or  global  localization  system 
where  robots  know  their  relative  positions,  SLAM  techniques  are  not  required. 

There  are  some  proposed  methods  for  exploration  based  on  cooperation  be¬ 
tween  agents.  Several  researchers  have  suggested  stigmergy  methods  [7]  and  [6]. 
The  central  theme  of  the  study  presented  in  [7]  is  the  use  of  stigmergy  to  achieve 
effects-based  control  of  cooperating  unmanned  vehicles.  They  accomplished  stig¬ 
mergy  through  the  use  of  locally  executed  control  policies  based  upon  potential 
field  formulas.  Nevertheless,  this  method  is  mainly  useful  when  there  are  a  lot  of 
small  robots  working  together. 

Most  of  the  existing  approaches  to  coordinate  multi-robot  exploration  assume 
that  all  agents  know  their  locations  in  a  shared  (partial)  map  of  the  environment. 
Effective  coordination  can  be  achieved  by  extracting  exploration  frontiers  from  the 
partial  map  and  assigning  robots  to  frontiers  based  on  a  global  measure  of  perfor¬ 
mance  [1],  [2],  [3]  and  [8].  Frontiers  are  the  borders  of  the  partial  map,  between 
explored  free  space  and  unexplored  areas  [2].  These  borders,  thus,  represent  loca¬ 
tions  that  are  reachable  from  within  the  partial  map  and  provide  opportunities  for 
exploring  unknown  terrain,  thereby  allowing  the  robots  to  greedily  maximize  infor¬ 
mation  gain  [9].  Compared  to  the  problems  occurring  in  single  robot  exploration, 
the  extension  to  multiple  robots  poses  new  challenges,  including: 

Coordination  and  cooperation:  Since  there  are  several  robots  working  in  the 
same  area,  they  must  have  some  kind  of  cooperation  with  each  other  in  order  to 
prevent  collisions  and  share  tasks.  Effective  cooperation  can  be  achieved  by  guiding 
the  robots  into  different  non-overlapping  areas  [2],  [3],  [10].  The  idea  is  that  at  a 
given  time  each  robot  should  be  dedicated  to  exploring  one  and  only  one  frontier. 
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Integration  of  information  collected  by  different  robots  into  a  single  map: 

The  main  goal  of  exploration  is  to  build  a  general  map  representing  the  environment. 
The  robots  should  integrate  all  the  data  into  a  single  map.  Map  merging  is  a  big 
challenge  in  this  field  that  has  been  address  in  several  studies  [11]. 

Uncertainty  in  localization  and  sensing:  The  effect  of  sensor  errors  (“noise”) 
and  errors  in  sensing  the  gradient  of  a  “resource  profile”  (e.g.,  a  nutrient  profile) 
should  be  considered.  Several  researchers  have  illustrated  that  the  agents  can  forage 
in  noisy  environments  more  efficiently  as  a  group  than  individually  [12],  [13]. 

Decision  making,  reasoning,  task  sharing  and  navigation:  Decision  making 
for  each  robot  in  an  unknown  environment  is  a  very  complex  problem.  Since  no¬ 
body  knows  what  lies  beyond  the  frontier  of  an  unexplored  area,  there  is  no  unique 
optimum  algorithm  that  is  completely  reliable.  In  each  situation,  a  robot  should 
make  a  decision  to  progress  exploring  task  based  on  partial  existing  map  and  also 
the  other  robots’  positions  and  objectives. 

In  terms  of  decision  making  algorithms,  a  lot  of  studies  for  multi-robot  explo¬ 
ration  do  not  address  unknown  environments.  Moreover,  most  of  the  research  in 
this  field  is  based  on  centralized  control  of  the  agents.  More  significant  approaches 
for  multi-robot  exploration  have  been  presented  in  [14]  and  [5].  In  both  techniques, 
the  robots  share  a  common  map  which  is  built  during  the  exploration.  Singh  and 
Fujimura  [14]  presented  a  decentralized  online  approach  for  heterogeneous  robots. 
Most  of  the  time,  the  robots  work  independently.  When  a  robot  finds  a  situation  that 
is  difficult  to  resolve  by  itself,  it  will  send  the  problem  to  another  robot  which  is 
likely  to  be  able  to  resolve  the  situation.  The  candidate  robot  is  chosen  by  trading 
off  the  number  of  areas  to  be  explored,  the  size  of  the  robot  and  the  straight-line 
distance  between  the  robot  and  the  target  region.  This  technique  generates  a  grid  ge¬ 
ometric  map;  therefore,  the  accuracy  of  the  map  depends  on  the  grid  size.  Moreover, 
all  the  robots  need  to  have  a  huge  memory  to  keep  the  entire  map.  In  the  approach 
of  Yamauchi  [5],  the  robots  move  to  the  closest  frontier,  which  is  the  closest  un¬ 
known  area  around  the  robot  according  to  the  current  map.  However,  there  is  no 
coordination  component  which  chooses  different  frontiers  for  the  individual  robots. 

Our  approach,  in  contrast,  is  specifically  designed  to  coordinate  the  robots  so  that 
they  automatically  do  not  choose  the  same  frontier.  It  generates  a  topological  map 
which  needs  much  less  memory  capacity.  As  a  result,  this  method  needs  significantly 
less  time  to  accomplish  the  task.  The  objective  in  this  research  is  to  generate  the  map 
of  an  unknown  environment  and  also  localize  all  the  fire  sources  in  the  area.  In  fact, 
the  final  future  goal  is  to  create  a  fire  risk  map  of  an  unknown  environment  with 
multiple  robots.  We  do  not  address  the  problems  of  risk  maps  here. 

During  the  exploration  process,  if  there  is  a  fire  source,  robots  should  report  it. 
The  authors  have  addressed  this  issue  in  previous  papers  [15],  [16],  [17]  and  [18]. 
The  last  achievement  of  that  research  is  kheNose.  The  kheNose  is  a  device  devel¬ 
oped  by  the  authors  to  sense  olfactory  information  through  the  use  of  gas  sensors, 
anemometers,  a  temperature  and  humidity  sensor  [19].  In  the  current  study,  the  last 
version  of  kheNose  has  been  used  to  detect  the  fire  sources. 

Collision  avoidance  between  the  robots  during  the  exploration  is  a  considerable 
issue  that  has  not  been  addressed  pragmatically  in  the  previous  studies.  In  this  study, 
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Fig.  1  The  entire  system  schematic 


we  propose  a  new  practical  method  for  multi-robot  unknown  environment  explo¬ 
ration  with  fire  source  detection  which  takes  “collision  avoidance”  and  “task  shar¬ 
ing”  into  consideration.  This  method  has  been  tested  in  the  real  world  and  also  in 
simulation.  The  effect  of  complexity  of  the  environment  and  also  the  numbers  of 
robots  are  the  main  parameters  that  have  been  studied  in  this  paper. 

2  The  Proposed  Method 

This  section  explains  the  concept  of  the  proposed  multi-robot  cooperation  tech¬ 
nique.  This  method  is  illustrated  in  schematic  diagram  Fig.  1,  in  a  global  perspective. 
As  shown  in  the  diagram,  the  method  includes  three  main  tasks:  navigation  and  ex¬ 
ploration,  decision  making,  and  fire  source  detection.  The  following  parts  describe 
some  of  the  elements  of  the  diagram  (Fig.  1). 

2.1  Decision  making 

The  main  goal  of  the  exploration  process  is  to  cover  the  whole  environment  in  the 
minimum  possible  time.  Therefore,  it  is  essential  that  the  robots  share  their  tasks 
and  individually  achieve  the  objectives  through  optimal  paths.  In  an  unknown  envi¬ 
ronment,  the  immediate  goals  are  the  frontiers.  Most  of  the  time,  when  the  robots 
are  exploring  the  area,  there  are  several  unexplored  regions,  which  poses  a  problem 
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Algorithm  1 !  Map  exploration  algorithm  -  Decision  making  method 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 


Receive_map_from_server() 

while  there  is  at  least  one  unexplored  link  in  the  map  do 

do  Follow  the  potential  field  algorithm  until  (getting  a  different  feature  in  the  environment) 
Receive_map_from_server() 
if  the  new  node  exists  in  the  map  then 
j  Update  the  map’s  data  with  new  information. 

|_  Send_map_to_the_server() 

else 

j  Add_new_node_to_map() 

|_  Send_map_to_the_server() 

if  the  current  node  has  any  unexplored  link  then 

Calculate  direction  gain  of  taking  each  unexplored  link,  based  on  the  position  of  the  other  robots  and 

get  the  higher  gain  direction  to  follow 

else 

Determine  the  best  not-assigned  unexplored  frontier,  based  on  their  gain  /  cost. 

Assign  the  frontier  to  the  robot. 

Calculate  the  best  path  to  take,  based  on  the  A*  algorithm  and  get  that  direction  to  follow. 


17  End  of  algorithm 


of  how  to  assign  specific  frontiers  to  the  individual  robots.  We  want  to  avoid  send¬ 
ing  several  robots  to  the  same  frontier,  which  may  result  in  collision  concerns.  The 
other  issue  is  that  we  do  not  want  to  have  a  central  station  for  moderating  the  robots. 
To  address  these  problems,  the  proposed  method  is  based  on  a  decision-theoretic 
exploration  strategy.  The  decision  making  algorithm  is  shown  in  Algorithm  1 . 

The  frontier  is  selected  based  on  the  cost  of  reaching  it  and  the  utility  it  can 
provide  to  the  exploration.  The  cost  is  calculated  through  the  A*  method  which 
simultaneously  determines  the  optimal  path  that  has  to  travel  to  reach  the  frontier 
and  its  distance. 

The  utility  depends  on  the  number  of  the  robots  and  their  proximity  to  the  fron¬ 
tier,  which  means  that  if  there  are  several  frontiers  at  similar  distances,  the  robot  will 
go  to  the  one  that  has  higher  utility.  This  procedure  will  make  the  robots  disperse 
and  explore  the  environment  in  a  more  efficient  way. 

2.2  Task  sharing  and  map  generation 

The  cooperation  between  the  robots  permits  the  exchange  of  data,  allowing  for  task 
sharing  and,  consequently,  an  efficient  distributed  exploration.  During  the  explo¬ 
ration,  there  is  only  one  global  shared  map  in  the  system.  This  map  is  in  a  server 
that  sends  and  receives  the  map  to  the  robots  whenever  they  request  it.  Within  this 
map,  besides  having  some  information  regarding  the  kind  of  nodes  and  their  posi¬ 
tion,  it  also  has  data  describing  the  location  of  the  robots  and  their  frontier  target, 
as  can  be  seen  in  Fig.  2.  Through  this  data,  a  robot  can  see  which  frontiers  are 
unexplored,  their  position  and  if  any  robot  has  targeted  them  as  its  objective,  thus 
allowing  a  distributed  efficient  exploration  (see  Algorithm  1  and  Fig.  1). 

In  dealing  with  multiple  robots  in  one  environment,  collision  between  robots  is  a 
very  important  aspect.  For  instance,  two  robots  might  be  in  a  narrow  corridor  with 
different  directions  and  they  may  want  to  pass  but  cannot  because  they  are  facing 
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Algorithm  2!  Collision  avoidance  between  the  robots(and  increasing  efficiency  of  collaborating) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


Calculate  a  confined  circular  area  around  the  robot 
if  any  other  robot  is  inside  the  circle  then 

Determine  if  the  robots  are  in  a  collision  pattern 
if  there  is  a  possibility  of  being  in  a  collision  pattern  then 
//Follow  the  rules  of  engagement: 
if  they  are  in  a  direct  collision  path  then 
reevaluate  their  goals, 
else 

if  they  are  both  currently  exploring  frontiers  OR  they  are  both  moving  inside  explored 
area  then 

give  priority  to  the  one  which  has  lowest  ID. 

else 


|_  Give  priority  to  the  one  that  is  exploring  a  frontier. 


13  else 

14  |  Continue  exploration  algorithm. 

15  End  of  algorithm 


Numberof  explorednodes 
V  Topological  Map  - 

3  


Connection  to  other  nodes: 

-  -1  meansit  is  unexplored; 

-  -2  meansit  hasno  connections; 

-  And  the  other  numbers  specify  to 
which  node  it  is  connected. 


144  28  151  28 
151  28  183  27 
183  27  187  52 
187  52  187  86 
187  86  225  98 


1  -1 
2  0 
-2  1 
-2  -2 
5  -2 


-1  -1 
-2  -2 

3  -2 

4  2 
-1  3. 


N 

Contains  the  ID 
of  the  robot 
which  is  going 
to  that  node. 


Start  and  End  p  o  sition 
Type  ofthe  node 
Index  of  the  node 


Contains  the  ID  of  the  Robot 
cunently  occupying  the  nodes, 
therefore  giving  the  position  of 
the  robots 


Fig.  2  Example  of  topological  map  data  Fig.  3  Khepera  III  and  kheNose 


each  other  or  they  may  even  treat  each  other  as  a  dead  end.  It  is  necessary  to  avoid 
these  types  of  problems.  Therefore,  we  have  implemented  some  rules  of  engagement 
justly  in  order  to  prevent  these  types  of  situations.  If  two  robots  are  going  against 
each  other  in  a  corridor,  it’s  because  the  task  sharing  has  not  been  done  previously 
in  a  efficient  way.  These  rules  of  engagement  are  shown  in  Algorithm  2. 

2.3  Fire  source  detection 

During  exploration  and  navigation,  the  robots  are  simultaneously  acquiring  infor¬ 
mation  from  the  environment  (see  Fig.  1).  All  the  robots  are  equipped  with  a  board 
developed  by  this  research  group,  which  integrates  temperature  and  chemical  sen¬ 
sors  named  kheNose  (Fig. 3). 

When  the  robots  are  mapping  the  environment,  they  are  constructing  the  map 
and  verifying  if  the  current  node  they  have  acquired  is  not  already  on  the  map,  thus, 
assuring  the  coherence  of  the  map  and  making  the  merging  process  simple,  where 
most  of  the  time  it  is  only  necessary  to  add  new  nodes  to  the  global  shared  map. 
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Fig.  4  Real  maze  experiment 


Fig.  5  Visual  positioning  application  screenshot 


They  use  an  eight  element  thermopile  array  sensor  to  measure  the  absolute  tem¬ 
perature  as  well  as  the  ambient  temperature  on  the  robot  to  be  able  to  distinguish 
the  heat  values.  When  the  temperature  data  received  from  the  sensor  informs  that 
there  is  an  increase  in  the  temperature  or  any  evidence  of  existence  of  a  fire  source 
in  the  environment,  the  robot  identifies  it  as  a  heat  source  clue  and  proceeds  to  make 
a  sequence  of  movements  to  acquire  more  accurate  values  in  order  to  be  certain  that 
it  isn’t  a  random  value.  After  the  fire  source  is  determined,  the  robot  performs  a 
triangulation  method  to  estimate  its  actual  position. 

3  Experiments 

As  a  test  plan,  we  built  several  maze-like  environments  which  are  combinations  of 
corridors,  corners,  crossings,  T-junctions  and  dead-ends  (one  of  them  is  shown  in 
Fig.  4).  Three  Khepera  III  robots  are  used  for  testing  the  algorithm.  These  Khepera 
III  robots  are  able  to  run  the  algorithms  mentioned  in  the  previous  section. 

Robots  should  know  their  positions  in  the  environment  to  be  able  to  explore  and 
navigate.  Therefore  a  network  camera  is  mounted  on  the  top  of  the  environment  and 
an  image  processing  computer  program  is  able  to  track  and  locate  each  robot.  Each 
robot  has  two  coloured  labels  on  the  top  that  can  be  seen  by  the  camera.  The  cam¬ 
era  is  connected  to  the  network  and  an  image  processing  program  tracks  the  robots’ 
position  and  provides  the  absolute  position  of  each  robot  via  wireless  network.  Im¬ 
age  processing  program  is  an  object  tracking  application  developed  by  the  authors. 
By  recognizing  the  center  of  each  coloured  label  and  calculating  the  line  crossing 
from  these  two  centers,  the  orientation  of  the  robot  can  be  computed.  The  program 
is  written  in  C++.  Fig. 5  shows  a  screen  shot  of  this  program. 

In  terms  of  feature  extraction,  based  on  values  measured  by  sonar  and  infrared 
sensors,  the  robot  recognizes  the  features  and  should  take  an  action  and  modify  the 
shared  map;  it  will  save  this  data  in  the  map  structure  as  a  new  node,  and  will  also 
update  the  data  related  to  the  previous  feature.  For  each  feature,  the  robot  saves  the 
data  in  the  topological  map,  including  start  position  and  end  position  of  that  node 
and  some  other  information  (Fig. 2). 

The  system  has  been  tested  with  different  start  positions  for  the  robots  in  different 
maze  structures.  There  is  a  small  candle  acting  as  a  heat  source  in  the  environment 
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Fig.  6  A  maze  with  34  nodes  Fig.  7  82  nodes  Fig.  8  135  nodes 


Fig.  9  Number  of  repeated  nodes,  Comparing  Fig.  10  Test  of  various  numbers  of  robots  against 
the  results  of  the  proposed  method  with  optimal  complexity  of  the  environment,  1 :  maze  in  Fig.6, 
method  2:  maze  in  Fig. 7,  3:  maze  in  Fig. 8 


which  robots  try  to  locate.  All  of  the  robots  are  equipped  with  kheNose  boards  for 
heat  source  detection. 

The  algorithm  has  been  tested  in  real  world  and  also  in  a  simulation  world.  For 
optimizing  the  exploration  algorithm  and  measuring  its  performance,  we  used  the 
Player/Stage  simulator  [20] .  In  the  real  world,  there  are  a  lot  of  constraints  that  do 
not  allow  for  testing  the  proposed  method  very  easily.  It  is  not  effortless  to  build 
various  test  plans  with  different  scales  for  testing  and  developing  the  method.  Since 
there  is  no  reliable  simulator  for  fire  and  smoke  in  Player/Stage,  the  whole  system 
has  been  tested  in  the  real  world. 

Fig.  4  shows  two  robots  exploring  a  small  maze  and  finding  a  fire  source.  Both 
robots  started  from  the  same  point  but  not  at  the  same  time.  We  intentionally  ran 
one  of  the  robots  a  few  seconds  after  the  first  one.  The  red  footprint  shows  the  first 
robot’s  path  and  the  blue  footprint  is  related  to  the  second  robot.  As  shown,  the  first 
robot  found  the  fire  source.  For  an  example  of  the  coordination  algorithm,  when  the 
second  robot  reached  the  junction  it  figured  out  that  the  path  in  the  front  was  already 
explored  and  it  chose  the  right  path. 

Since  there  is  no  accepted  standard  benchmark,  measuring  the  performance  of  a 
behavioural  based  multi-robot  unknown  area  exploration  algorithm  is  a  very  difficult 
job.  One  of  the  possible  ways  to  do  that  is  to  compare  the  proposed  method  with 
the  optimal  possible  method.  But  the  issue  is  that  there  is  no  optimal  method  for 
exploring  an  unknown  world.  However,  there  is  an  optimal  solution  for  minimizing 
the  travelling  path  if  the  world  (maze)  is  completely  known  before  exploration. 
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The  algorithm  has  been  tested  with  different  number  of  robots  in  a  specific  maze. 
The  model  of  that  maze  is  also  given  to  the  optimal  method  and  then  we  compared 
the  results  of  the  proposed  algorithm  with  the  optimal  method.  Since  the  optimal 
method  has  the  world’s  model  but  the  proposed  method  is  exploring  the  unknown 
world,  it  is  obvious  that  the  results  of  the  proposed  method  are  always  worse  than 
optimal  method  but  it  can  be  a  good  criteria  for  evaluating  the  method. 

The  number  of  repeated  nodes  during  travel  can  be  a  good  parameter  for  measur¬ 
ing  the  performance  of  the  method.  A  repeated  node  is  a  node  that  robots  pass  more 
than  once.  Fig.  9  shows  the  number  of  nodes  that  have  been  repeated  more  than  once 
in  the  optimal  method  as  well  as  in  the  proposed  algorithm  for  the  maze  shown  in 
Fig.  6.  A  good  conclusion  from  the  graph  in  Fig.  9  is  that  there  is  a  trade-off  between 
the  number  of  robots  and  the  size  of  the  world.  It  shows  that  the  proposed  approach 
is  acceptably  comparable  with  the  optimal  method. 

Another  parameter  for  evaluation  of  the  method  is  the  exploration  time.  The  pro¬ 
posed  method  has  been  tested  with  a  different  number  of  robots  in  different  mazes. 
The  environment  shown  in  Fig.  5  that  is  a  3.5  x  4  meters  maze  is  tested  by  one, 
two  and  three  Khepera  robots  separately.  One  robot  could  explore  the  environment 
in  412  seconds.  This  environment  has  been  explored  by  two  robots  in  254  seconds. 
The  exploration  time  for  the  same  maze  with  three  Kheperas  was  212  seconds.  Each 
result  is  the  average  of  five  similar  tests.  Different  tests  with  constant  conditions  had 
similar  results  with  about  seven  percent  variance.  In  all  the  tests,  the  maximum 
speeds  of  the  Kheperas  are  kept  constant. 

In  simulation,  the  mazes  shown  in  Fig. 6,  Fig. 7  and  Fig. 8  have  been  tested  sep¬ 
arately  with  one,  two,  three  and  four  robots  and  the  results  are  shown  in  Fig.  10. 
The  graph  shows  the  average  of  five  tests  for  each  data.  The  variance  was  less  than 
one  percent.  It  is  obvious  that,  with  more  robots,  exploration  time  will  be  improved. 
Another  conclusion  from  the  graph  is  that  having  more  robots  is  more  advantageous 
in  a  complex  maze  than  in  a  simple  maze.  This  also  proves  that  the  cooperation 
algorithm  in  this  approach  is  efficiently  functional. 

In  the  real  experiment  the  robots  could  locate  the  fire  sources  during  the  ex¬ 
ploration.  The  performance  of  fire  source  detection  has  been  addressed  in  previous 
studies  by  the  group  [17],  [19]. 

4  Conclusions  and  future  works 

The  proposed  method  for  multi-robot  unknown  environment  exploration  has  been 
implemented  and  experimented  in  the  test  plans.  The  robots  are  able  to  cooperate 
and  create  a  shared  topological  map  of  the  unknown  environment.  Cooperation  be¬ 
tween  the  robots  is  done  by  sharing  information  in  the  shared  map.  The  algorithm 
has  been  tested  against  several  different  configurations  in  Player/Stage  simulation 
program.  The  exploration  algorithm  is  merged  with  fire  source  detection  algorithm 
and  has  been  tested  in  the  real  world.  The  effect  of  the  number  of  the  robots  on 
exploration  in  different  type  of  environment  has  been  discussed.  The  results  show 
the  efficiency  and  reliability  of  this  method. 

In  terms  of  implementation,  more  accurate  sonar  sensors  should  be  installed  on 
the  robots.  The  visual  localization  system  provides  many  constraints  for  testing, 
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and  is  thus  being  replaced  by  a  non-centralized  localization  system  that  integrates 
odometry  with  gyroscopes  and  accelerometers  as  well  as  an  inbuilt  compass. 

The  system  works  only  in  a  maze  like  environment,  in  future  it  should  work  in 
a  real  environment  with  less  restrictions.  For  this  purpose,  feature  extraction  should 
be  more  developed,  range  sensors  should  be  more  accurate  and  the  topological  map 
should  contain  some  geometric  data.  Perhaps  a  mixture  of  topological  and  geometric 
maps  will  be  useful  in  this  case. 
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Abstract  A  new  measurement  concept  for  cut-to-length  forest  harvesters  is  pre¬ 
sented  in  this  paper.  The  cut-to-length  method  means  that  the  trees  are  felled,  de- 
limbed  and  cut-to-length  by  the  single-grip  harvester  before  logs  are  transported 
to  the  roadside.  The  concept  includes  measurements  done  to  standing  trees  before 
felling  to  calculate  optimal  length  of  logs.  The  modern  forest  harvesters  use  me¬ 
chanical  measurements  for  diameter  and  length. 

In  this  paper,  we  will  discuss  different  non-contact  methods  of  measuring  a  tree 
stem  before  felling  and  during  the  cut-to-length  process.  Standing  tree  stems  are 
measured  with  a  3D  scanner  and  a  computer  vision  systems.  Trunk  processing  is 
measured  with  a  computer  vision  system.  Based  on  these  new  measurements,  tree 
cutting  pattern  could  be  optimized  and  harvester  automation  increased,  resulting  in 
higher  resource  utilization. 


1  Introduction 

A  long-term  vision  of  the  work  presented  here  is  that  forest  harvesters  could  be 
automatized  to  improve  the  overall  efficiency  and  quality  using  advanced  measure¬ 
ment  technology.  The  Metrix  project  is  a  broad  scale  effort  to  realize  this  vision. 
The  project  studies  new  measurement  technologies  for  forest  harvester  heads.  The 
research  goals  are  to  measure  and  estimate  the  tree  trunk  dimensions  and  other  qual¬ 
ity  variables  with  non-contact  methods.  The  estimation  is  done  in  real  time  so  that 
the  optimal  cutting  pattern  can  be  calculated  and  more  detailed  trunk  information 
can  be  sent  to  processing  mills.  This  research  focuses  on  applying  signal  process¬ 
ing  methods  to  machine  vision,  laser  measurement  and  other  optical  measurement 
technologies  in  demanding  forest  environment. 
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A  new  measurement  concept  for  cut-to-length  forest  harvesters  is  presented  in 
this  paper.  3D  scanner  and  machine  vision  based  measurements  are  combined  for 
measuring  standing  tree  stems  before  felling.  To  help  machine  vision  on  the  ap¬ 
proach  stage,  structured  light  is  also  studied  as  a  part  of  the  project  [6].  Felled  tree 
trunks  are  fed  through  the  head  and  cut  to  logs  at  desired  lengths.  Diameter  and 
length  of  the  processed  logs  are  measured  using  machine  vision  system. 

The  research  is  a  continuation  to  the  work  done  in  the  Forestrix  project  [5,  8,  9]. 
The  Forestrix  project  studied  forest  and  tree  trunk  measurement  technologies,  signal 
processing  methods  and  algorithms  for  semiautomatic  control  of  forest  harvesters. 
The  main  focus  was  to  produce  and  update  an  accurate  2D  tree  map  in  real  time, 
with  diameter  at  breast  height  (DBH)  information.  In  forest  thinning  operations,  the 
tree  map  can  support  the  harvester  operator  to  select  the  right  trees  and  to  achieve 
optimal  stand  density.  Semiautomatic  harvester  operation  with  tree  map  information 
was  tested  on  a  simulator.  The  collected  data  improves  the  verifiability  of  forest 
operations  and  the  data  can  be  used  for  planning  future  forest  tasks. 

This  paper  consists  of  the  following  sections:  First,  modern  harvesters,  tree  pa¬ 
rameters  and  measurement  platforms  are  discussed.  Second,  the  results  including 
tree  stem  laser  measurement,  motion  vision  based  structure  estimation  of  the  stand¬ 
ing  tree  stem  and  machine  vision  based  trunk  measurement  system  for  tree  process¬ 
ing  are  presented.  Finally,  some  conclusions  about  the  applicability  of  the  tested 
measurement  concept  are  given. 


1.1  Modem  Forest  Harvesters 


Modern  forest  harvesters  are  already  very  efficient  machines.  Harvester  heads  have 
several  functions  including  the  felling,  delimbing,  diameter  and  length  measuring, 
cutting  to  length,  color  marking  and  stump  treatment.  John  Deere  745  harvester 
heads  (Fig.  5  and  1)  have  been  used  in  real  forest  tests  during  this  project.  Measure¬ 
ment  and  data  gathering  are  very  important  in  modern  tree  harvesting.  The  forest 
owner  and  the  harvester  contractors  are  paid  according  to  the  harvester  measure¬ 
ments.  Forest  companies  use  harvester  information  to  plan  the  subsequent  forest 
operations. 

In  modem  forest  harvesters,  the  diameter  sensors  are  connected  to  the  feeding 
roller  arms  or  the  delimbing  knifes.  The  sensors  are  usually  potentiometers.  Di¬ 
ameter  measurement  depends  on  how  wide  the  feeding  arms  open.  The  length  of 
the  tree  is  measured  usually  with  a  2  channel  incremental  encoder.  During  process¬ 
ing,  the  tree  trunk  is  pressed  firmly  against  a  measurement  roller  disc,  see  Fig.  1). 
Weather  conditions  influence  the  measurement  accuracy,  e.g.,  measurement  roller 
spikes  penetrate  deeper  into  unfrozen  wood  than  into  frozen  wood.  Calibration  is 
done  regularly  to  guarantee  the  measurement  quality. 
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Fig.  1  Timberjack  1070  harvester  with  John  Deere  745  head  on  test  site  on  the  left  and  standing 
tree  measurement  sensors  attached  to  a  745  head  on  the  right. 


1.2  Tree  Parameters 


Measurements  of  standing  tree  stems  should  include  parameters  like  height  of  crown 
base,  taper,  sweep  and  lean.  Calculating  these  parameters  with  ground-based  laser 
scanners  for  forest  inventories  have  been  studied  [4,  10,  12].  But  the  possibility  of 
measuring  tree  parameters  with  a  moving  forest  harvester  before  felling  is  a  novel 
one.  Presently,  the  measurements  performed  in  moder  harvester  heads  are  obtained 
too  late  for  true  cut-to-length  pattern  optimization.  Thus,  having  measurements  of 
the  tree  stem  before  felling  gives  valuable  information  that  is  not  obtainable  with 
traditional  measuring  implementations. 


1.3  Measurement  System 

The  measurement  system  consists  of  different  sensors  used  for  measuring  standing 
tree  stems  and  cut-to-length  parameters.  The  system  has  been  used  on  different  plat¬ 
forms  to  collect  measurement  data.  Different  variations  of  the  measurement  systems 
have  been  developed  during  the  project.  This  section  describes  the  latest  system. 

The  measurement  system  used  in  all  terrain  vehicle  (ATV)  and  harvester  head 
tests  shown  in  Fig.  2  and  in  1.  The  front  box  on  ATV  in  Fig.  2  acts  as  a  stand  for 
the  sensors  while  the  box  in  the  back  contains  a  24  volt  battery  for  system  power. 
The  scanners  and  the  measurement  PC  operate  directly  from  the  battery.  The  sys¬ 
tem  sensors  in  Fig.  2  consist  of  2D  and  3D  laser  range  finders,  GPS  receiver  and 
MEMS  inertial  measurement  unit  (IMU).  IMU  is  used  to  provide  pose  information 
of  the  platform  and  is  essential  for  combining  various  measurements  together.  ATV 
measurements  have  been  done  with  different  development  versions  of  the  forest  3D 
scanner  system.  A  stereo  camera  pair  (Fig.  1  and  2)  is  used  to  measure  standing  tree 
structure  using  visual  motion  [7]  and  to  tree  trunk  cut-to-length  processing  (5). 
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Fig.  2  ATV  platform  with  3D  scanner  measurement  equipment  on  the  left  and  stereo  camera  pair 
on  the  right. 


Forest  harvester  tests  are  done  with  the  same  measurement  system  as  the  ATV 
tests.  Harvester  head  sensor  arrangement  for  standing  tree  measurement  is  shown 
in  Fig.  1.  Harvester  head  cut-to-length  processing  measurement  system  is  shown 
in  Fig.  5.  Processing  measurements  are  done  with  a  stereo  camera  pair.  Measure¬ 
ment  system  sensors  are  mounted  on  a  specially  designed  mounting  attached  to 
John  Deere  745  harvester  head.  See  Fig.  5  and  1. 


2  Standing  Tree  Stem  Laser  Measurement 

Robust  robot  navigation  in  unstructured  and  outdoor  environments  is  an  unsolved 
problem.  The  absence  of  simple  features  leads  to  the  need  for  more  complex  per¬ 
ception  and  modeling.  This  leads  to  a  big  variety  of  navigation  algorithms  and  map 
representations,  depending  on  the  kind  of  environment,  the  degree  of  structuring 
and  the  target  application.  Many  different  outdoor  Simultaneous  Localization  and 
Mapping  (SLAM)  algorithms  have  been  studied  in  recent  years  [2,  13]  and  [1]. 

In  this  case,  a  scan  correlation  based  method  is  used  for  short  term  sensor-based 
dead  reckoning.  There  are  numerous  different  scan  correlation  methods  available  to 
be  used  to  sensor-based  dead  reckoning.  The  Iterative  Closest  Point  (ICP)  and  Sum 
of  Gaussian  (SoG)  methods  are  among  the  most  popular.  Different  scan  correlation 
methods  are  presented  e.g.  by  Bailey  [1]. 

Scan  correlation  is  not  enough  to  combine  rotating  3D  scanner  measurements 
into  meaningful  tree  measurement  data.  A  SLAM  based  approach  to  harvester  head 
localization  is  used  to  calculate  head  movement  and  combine  measurements  to¬ 
gether.  The  method  combines  2D  laser  localization  with  IMU  measured  pose  infor¬ 
mation  and  height  from  the  3D  scanner  system  to  calculate  6  degrees  of  freedom 
(DOF)  movement  of  the  head  and  measurement  data  of  the  tree  stem.  The  research 
with  laser  scanners,  scan  correlation  and  SLAM  presented  here  is  continuation  to 
the  work  done  in  the  Forestrix  project  [5,  8,  9].  Different  filtering  methods  have 
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been  tested  to  provide  the  best  possible  6DOF  movement  but  the  work  is  still  in 
progress. 

Tree  stem  3D  scanner  measurement  is  studied  to  get  parameters  like  height  of 
crown  base,  taper,  sweep,  trunk  dimensions,  branches  and  lean.  3D  point  data  mea¬ 
surement  count  depends  on  for  how  long  the  6DOF  movement  estimation  is  accu¬ 
rate.  If  the  movement  can  be  accurately  estimated  for  the  whole  movement  to  the 
tree,  the  point  cloud  collected  is  more  precise  and  parameter  calculations  are  easier. 
If  the  movement  accuracy  is  low,  the  measurement  error  in  the  point  cloud  is  too 
great  for  precise  parameter  calculation.  From  the  measurements  taken  in  movement 
shown  in  Fig.  3  we  can  see  that  in  this  long  approach  (approximately  7  meter),  ac¬ 
curacy  of  the  movement  has  not  been  the  best  possible  to  combine  all  measurement 
points  into  one  point  cloud.  The  points  measured  from  close  do  not  match  the  points 
measured  from  a  greater  distance.  In  this  case,  it  is  better  to  use  shorter  3D  mea¬ 
surements  and  calculate  parameter  values  from  each  short  measurement  together  to 
get  better  results.  6DOF  movement  calculated  for  the  same  approach  is  shown  in 
Fig.  3.  The  height  estimate  has  the  highest  variance  of  the  6Dof  because  the  rough 
forest  ground  and  under  foliage  provide  a  poor  height  estimate.  The  ground  height 
estimation  from  3D  scanner  system  is  better  if  the  ground  is  flat  and  smooth. 


Fig.  3  Estimated  6DoF  movement  for  one  approach  measurement  in  well  defined  pine  forest 
(ATV)  (left).  3D  laser  scan  from  movement  (center).  Darker  measurements  when  the  laser  is  close 
to  the  tree.  Calculated  tree  parameters  (right). 


The  ground  level  is  searched  from  the  measurement  point  cloud  using  RANSAC 
algorithm  [3].  The  ground  level  detection  helps  us  to  find  the  location  the  tree  stump. 
The  trunk  diameters  are  calculated  in  different  height  segments  of  the  trunk  depend¬ 
ing  on  the  point  count.  The  diameter  is  calculated  using  cylinder  fitting  ([4,  10]  and 
[12])  or  simple  circle  fitting.  Taper  information  is  used  to  estimate  the  true  stem  and 
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extract  the  branches  in  the  next  trunk  piece.  Crown  base  height  can  be  estimated 
from  the  trunk  diameter  calculations.  The  extracted  branches  are  used  to  determine 
the  height  of  the  dry  branches  and  the  crown  height.  The  sweep  and  lean  can  be 
estimated  from  the  tree  stem  diameter  and  center  points.  The  branches  are  excluded 
from  the  diameter  calculations  and  are  shown  in  different  color  in  Fig.  3.  The  forest 
3D  scanning  system  with  90  degree  field  of  view  is  designed  for  measuring  only 
tree  stems  up  to  crown  height.  Goal  is  to  measure  tree  stem  diameters  with  error 
less  than  1  cm.  The  research  done  so  far,  indicates  that  this  can  be  achieved. 


3  Motion  Vision  Based  Structure  Estimation  in  Natural  Forest 
Environment 

Motion  vision  can  be  used  to  determine  world  structure  from  a  video  sequence.  A 
general  research  problem  in  this  context  is  to  acquire  relevant  information  through 
relative  motion  of  cameras  and  the  environment.  Using  cameras  to  sensing  the  near 
surroundings  is  beneficial  due  to  mass  production  of  camera  components  (low  cost) 
and  wide  availability  (good  support).  In  addition,  cameras  are  generally  applicable 
to  various  environmental  conditions  assuming  that  related  research  problems  are 
solved.  Forest  is  a  good  example  of  a  challenging  environment  where  e.g.  occlusion, 
varying  light  conditions  and  natural  uncontrolled  conditions  provide  difficulties  for 
computer  vision  tasks  and  algorithms. 

The  precise  objective  of  this  work  was  to  measure  trees  from  a  distance  using 
a  motion  vision  approach.  In  addition,  the  whole  visible  environment  structure  is 
reconstructed.  Monochrome  digital  video  cameras  were  attached  on  an  ATV  and  a 
video  sequence  was  recorded  while  the  ATV  approaches  a  tree.  Data  from  a  single 
camera  was  used  but  a  calibrated  stereo  camera  system  was  present  to  gather  richer 
data  for  future  use.  Based  on  the  measurements,  tree  cutting  could  be  optimized  and 
harvester  automation  increased,  resulting  in  higher  resource  utilization  efficiency. 
An  example  of  a  structure  estimate  for  an  instantaneous  moment  in  forest  is  shown 
in  Fig.  4. 

A  broad  range  of  motion  vision  methods  were  explored.  Based  on  the  findings, 
the  final  solution  consists  of  three  sub-components:  block  matching,  motion  estima¬ 
tion  and  triangulation.  A  consequent  pair  of  images  from  a  video  sequence  is  used 
as  the  source  data.  The  block  matching  algorithm  uses  a  hierarchical  image  pyramid 
approach  with  three  phases  at  each  level  of  hierarchy.  First,  integer  pixel  precision 
solution  for  a  dense  optical  flow  field  is  computed  with  a  limited  range  full  search. 
Second,  linear  sub-pixel  interpolation  is  used  to  fine-tune  the  integer  precision  re¬ 
sults.  Third,  the  dense  optical  flow  field  is  filtered  adaptively  to  reduce  noise  and  to 
compensate  for  occlusion  errors.  Estimating  motion  from  the  optical  flow  is  done 
by  using  a  selected  subset  of  points  for  which  an  optical  flow  fit  error  is  minimized 
using  numerical  optimization  to  solve  for  both  the  motion  parameters  and  depth  for 
the  selected  points  simultaneously.  Finally,  solving  for  the  dense  structure  from  the 
motion  and  optical  flow  can  be  formulated  as  a  linear  triangulation  problem  and  is 
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Fig.  4  A  structure  estimate  for  an  instantaneous  moment  in  a  forest.  Color  brightness  indicates  the 
depth;  the  brighter,  the  further  away. 


thus  easy  to  calculate.  Both  block  matching  and  triangulation  can  be  computed  in 
parallel.  Thus,  parallel  computation  power  of  multiple  processor  cores  or  modern 
graphics  cards  can  be  used  in  the  future  to  meet  the  relative  high  computation  cost 
of  the  algorithms. 

The  results  obtained  show  robustness  with  respect  to  environmental  challenges 
and  the  main  objective  of  tree  segmentation  for  measurement  is  achieved.  In  ad¬ 
dition,  overall  depth  map  construction  quality  is  sufficient  for  a  more  broad  range 
of  potential  applications.  In  summary,  the  results  prove  that  motion  vision  methods 
can  be  applied  in  uncontrolled  forest  environment  conditions.  More  detailed  expla¬ 
nation  on  motion  vision  based  structure  estimation  in  natural  forest  environment  can 
be  found  in  [7]. 
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4  Machine  Vision  Based  Trunk  Measurement  System 


The  objective  was  to  develop  a  system  capable  of  measuring  the  length  and  thick¬ 
ness  of  a  processed  trunk.  Current  solutions  for  measuring  the  length  utilize  a  cog 
wheel  that  follows  the  surface  of  the  wood.  Accuracy  of  present  systems  depends 
on  the  qualities  of  the  wood,  e.g.,  how  soft  the  bark  is  and  how  many  branches  there 
are.  The  measurement  error  is  usually  around  1-2  cm.  The  thickness  of  the  trunk  is 
measured  mechanically  with  two  arms. 

Stereo  camera  pair  was  used  to  track  both  the  pose  of  the  harvester  head  and 
the  movement  of  the  trunk.  The  system  uses  two  black  and  white  Foculus  F0134S 
cameras  having  a  resolution  of  640x480.  In  the  test,  the  capture  rate  used  was  60 
frames  per  second.  The  cameras  were  fixed  to  the  upper  part  of  the  harvester  head. 
Calibration  of  camera  and  stereo  parameters  was  done  in  advance.  As  lighting  we 
used  originally  halogen  lamps  and  in  later  test  LED  lights  synchronized  with  the 
cameras.  Stereo  camera  pair  in  harvester  head  is  shown  in  Fig.  5. 


Fig.  5  Harvester  head  with  stereo  camera  pairs  and  lights  attached  (above).  Left  and  right  stereo 
images  showing  selected  features,  trunk  estimate  and  the  tracked  chessboard  pattern. 


The  basic  problem  to  be  solved  was  that  there  are  usually  three  different  motions 
in  respect  to  the  cameras  that  have  to  be  distinguished;  harvester  heads  motion, 
trunk’s  motions  and  background’s  motion.  All  of  these  motions  have  to  be  presented 
using  both  rotation  and  translation. 

The  basic  algorithm  begins  by  selecting  some  good  Harris  corners  from  the  trunk. 
Then  the  features  were  stereo  matched  and  tracked  in  consecutive  frames  so  that 
3D  motion  for  corresponding  points  could  be  calculated.  To  ease  the  selection  and 
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matching  of  features,  a  cylindrical  estimate  of  the  trunk  was  used  (Fig.  5).  It  was 
found  that  the  number  of  false  matches  between  left  and  right  images  was  reduced 
when  a  priori  estimate  of  the  z-coordinates  was  used. 

Lower  part  of  the  harvester  head  rotates  around  a  single  axis  and  for  that  reason 
it  is  possible  to  reduce  the  six  parameter  pose  estimation  problem  of  the  harvester 
head  to  a  one  parameter  estimation  problem.  The  pose  of  the  harvester  head  can  be 
determined  by  tracking  selected  features  (e.g.  [11])  or  by  estimating  the  pose  of  the 
chessboard  pattern  fixed  in  the  harvester  head  (Fig.  5. 

Two  different  approaches  were  used  to  determine  the  length  of  the  wood.  The 
first  method  tracks  harvester  head  and  trunk  separately.  A  single  point  in  the  end  of 
the  trunk  is  selected  and  its  location  is  estimated  from  the  translation  and  rotation  of 
the  trunk.  The  length  of  the  wood  is  then  determined  by  the  distance  of  the  original 
point  and  the  estimated  end  point. 

The  second  approach  is  based  on  the  assumption  that  the  trunk’s  rotation  is  nearly 
identical  to  harvester  head’s  rotation.  When  the  rotation  and  translation  of  the  har¬ 
vester  head  is  estimated,  it  is  possible  to  remove  it  from  the  trunk’s  motion  vectors. 
When  harvester  head’s  motion  is  compensated,  the  translational  movement  of  the 
trunk  can  be  extracted  and  the  length  of  the  trunk  determined. 

Seven  trees  have  been  cut  to  length,  estimated  and  compared  to  handmade  mea¬ 
surements.  All  the  measured  trunks  were  about  three  meters  long.  The  absolute  dif¬ 
ferences  between  estimates  and  the  real  lengths  with  the  first  approach  was  in  five 
cases  no  more  than  1  mm,  with  largest  estimate  error  being  12  mm.  The  second 
method  gave  slightly  worse  results,  but  still  the  error  was  in  most  cases  6  mm  or 
less. 

The  data  measured  has  been  relatively  easy  and  the  more  challenging  data  has 
not  yet  been  fully  analyzed.  When  the  trunk  moves  faster  and  harvester  head  swings 
more,  estimation  of  motion  becomes  harder.  Algorithms  for  estimating  the  3D  struc¬ 
ture  and  thickness  of  the  trunks  are  still  under  development. 


5  Conclusions  and  Future  Work 

A  new  concept  to  cut-to-length  forest  harvester  remote  sensing  system  was  pre¬ 
sented  in  this  paper.  The  concept  includes  new  non-contact  measurements  done  be¬ 
fore  felling  and  during  cut-to-length  processing.  The  research  goals  are  to  measure 
and  estimate  the  tree  stem  dimensions  and  other  quality  variables  affecting  the  pro¬ 
cessing  of  the  trunk.  This  research  focuses  on  applying  signal  processing  methods 
to  machine  vision,  laser  measurement  and  other  optical  measurement  technologies 
in  demanding  forest  environment.  Based  on  the  research,  tree  cutting  pattern  could 
be  optimized  and  harvester  automation  increased,  resulting  in  higher  resource  uti¬ 
lization. 

The  standing  tree  stem  laser  measurement,  the  machine  vision  based  trunk  mea¬ 
surement  and  the  motion  vision  based  structure  estimation  systems  were  presented. 
The  current  measurement  and  estimation  methods  implemented  show  promise  that 
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the  measurement  goals  set  for  the  project  will  be  met  in  well  defined  forests.  How¬ 
ever,  dense  and  cluttered  forest  environment  is  tough  for  precision  measurements 
and  will  need  more  research. 

The  work  presented  in  this  paper  is  a  part  of  an  ongoing  research  project.  Har¬ 
vester  head  tests  are  ongoing  and  algorithm  development  is  unfinished.  Precision 
of  the  measured  parameters  is  extremely  important  and  research  to  better  compare 
forest  machine,  hand,  laser  and  machine  vision  measured  tree  stem  parameters  is 
underway. 
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Abstract  Preventive  maintenance  of  high-voltage  transmission  power  lines  is  a 
dangerous  task,  but  the  obstacles  mounted  on  the  lines  have  so  far  prevented  the 
automation  of  this  task.  Expliner  aims  to  overcome  such  obstacles  by  controlling 
actively  the  position  of  its  center  of  mass,  thus  changing  its  configuration  as 
needed  when  moving  on  the  power  lines.  This  work  presents  the  design  of  Explin¬ 
er  and  results  of  field  experiments  performed  with  very  high  voltages  to  prove  the 
effectiveness  of  the  proposed  concept. 


Introduction 

Urban  centers  and  industries  rely  heavily  on  electric  energy  provided  by  an 
electric  grid.  This  electric  energy  is  needed  for  safety,  transportation,  sanitation 
and  other  essential  functions.  If  there  is  any  problem  in  the  electric  grid  linking  the 
energy  generation  plants  and  urban  or  industrial  centers,  the  supply  of  energy  may 
have  to  be  stopped,  affecting  in  a  negative  way  the  lives  of  millions  of  people  [1]. 
In  order  to  avoid  such  problems,  the  preventive  maintenance  of  electric  power 
lines  is  of  vital  importance.  However,  this  is  a  very  dangerous  and  time¬ 
demanding  job,  requiring  specialized  people  to  walk  on  the  lines,  suspended  sev¬ 
eral  tens  of  meters  above  the  ground  in  remote  areas  like  mountains  and  deserts.  In 
addition,  usually  the  energy  supply  must  be  interrupted  momentarily  for  such  in¬ 
spections. 

There  have  been  proposals  to  carry  on  the  inspection  of  power  lines  in  more  ef¬ 
ficient  ways.  One  involves  the  use  of  helicopters  to  check  visually  the  conditions 
of  the  cables  [2].  However,  even  in  the  case  of  remotely  controlled  helicopters  [3], 
it  may  be  dangerous  to  fly  close  to  power  lines,  and  the  visual  inspection  will  pro¬ 
vide  images  of  only  the  upper  side  of  the  cables.  Another  approach  is  to  have 
people  working  directly  on  live  lines  with  special  gear  to  insulate  them  from  the 
very  high  voltages  [4].  This  method  requires  good  access  to  the  cables  from  the 
ground,  which  may  not  be  possible  in  mountains  or  other  remote  locations.  In  ad¬ 
dition,  it  still  depends  on  people  moving  on  the  cables,  a  risky  operation. 
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In  the  field  of  robotics,  several  researchers  have  proposed  machines  for  remote 
inspection  of  power  lines.  However,  the  presence  of  obstacles  on  the  cables  (such 
as  cable  spacers  and  clamp  suspenders  connected  to  the  towers)  makes  the  auto¬ 
mation  of  this  task  more  difficult.  Campos  et  al.  have  proposed  a  machine  to  in¬ 
spect  the  warning  spheres  installed  on  the  high-voltage  lines  [5],  but  the  machine 
is  not  able  to  overcome  any  sort  of  obstacle.  Sawada  et  al.  have  proposed  a  robot 
with  several  degrees  of  freedom  in  order  to  cross  clamp  suspenders  [6],  but  this 
resulted  in  a  bulky  and  heavy  machine  that  is  difficult  to  carry  to  the  field.  Tang 
and  Zhu  have  proposed  a  different  machine  that  moves  on  the  ground  line,  above 
the  high  voltage  lines  and  with  fewer  obstacles  [7]  [8].  However,  the  inspection 
performed  with  this  machine  is  similar  to  the  one  with  a  helicopter,  since  only  the 
upper  side  of  the  high  voltage  cables  can  be  inspected  with  a  video  camera.  Mon- 
tambault  and  Pouliot  are  developing  a  very  practical  machine  for  inspection  of 
single  high-voltage  lines  [9]  [10],  including  the  defrosting  of  frozen  wires.  In  Ja¬ 
pan,  where  the  current  research  is  being  conducted,  the  high-voltage  lines  are 
grouped  in  bundles  of  four  cables,  and  this  presents  different  challenges  that  have 
not  been  solved  in  a  satisfactory  way  until  now. 


Proposal  of  Automation 

Figure  1  shows  Expliner,  the  machine  proposed  by  the  authors  to  perform  re¬ 
mote  inspection  of  high-voltage  lines.  Its  mobility  is  based  on  pulleys,  placed  on 


Fig.  1  -  Concept  of  Expliner  with  main  components. 
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Fig.  2  -  Loading  pipe  between  tower  and  cables  (left)  and  clamp  suspender  (right). 

the  upper  cables,  driven  by  electric  motors.  The  pulley  units  are  connected  to  a 
single  horizontal  base. 

The  horizontal  base  also  has  a  vertical  element,  with  a  2-DOF  manipulator 
connected  to  its  lower  end.  On  the  tip  of  the  manipulator  there  is  a  counter-weight, 
housing  batteries  and  electronics.  Therefore,  by  moving  the  manipulator,  it  is 
possible  to  change  the  position  of  the  center  of  mass  of  the  machine.  Thus,  it  be¬ 
comes  possible  to  lift  one  of  the  two  pulley  units  in  order  to  overcome  large  ob¬ 
stacles,  as  presented  for  the  first  time  in  [1 1]. 

If  Expliner  would  move  only  in  a  straight  line,  it  would  be  necessary  to  have 
only  1  degree  of  freedom  in  the  manipulator.  However,  there  are  times  when  the 
machine  must  perform  rather  complex  motions,  such  as  when  crossing  clamp  sus¬ 
penders,  or  when  moving  on  an  inclined  loading  pipe  between  the  tower  and  the 
main  cables,  as  displayed  in  Figure  2.  For  such  cases,  not  only  the  2  DOFs  of  the 
manipulator  are  needed,  but  also  the  rotation  around  the  vertical  axis  of  each  mo¬ 
tion  unit.  These  motions  have  already  been  described  in  [1 1]. 

The  next  sections  will  focus  on  the  mechanical  design  of  Expliner,  and  will  al¬ 
so  describe  the  main  differences  between  the  current  machine  and  the  first  proto¬ 
type,  presented  in  [1 1]. 


Fig.  3  -  Friction  experimental  setup  (left)  and  results  from  experiments  (right). 
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Mechanical  Design 


Friction 

In  order  to  dimension  the  actuators,  it  was  necessary  to  determine  the  friction 
between  the  rubber-coated  pulleys  and  the  power  cables.  In  spite  of  the  theoretical 
models  taking  into  account  the  deformation  of  the  rubber  around  the  contact  area, 
the  lack  of  friction  data  made  it  necessary  to  determine  the  friction  empirically,  by 
applying  loads  on  the  pulleys,  mounted  on  the  cables,  and  measuring  the  force  re¬ 
quired  to  pull  them  at  a  constant  speed.  The  experimental  setup  and  the  data  are 
displayed  in  Figure  3,  and  show  that  the  pulling  force  is  linearly  proportional  to 
the  load  applied  on  the  pulleys,  yielding  a  coefficient  of  friction  that  was  used  for 
the  selection  of  the  actuators. 


Actuators 

All  actuators  were  designed  by  the  authors,  so  that  the  motors  and  transmis¬ 
sions  be  assembled  in  the  most  compact  and  effective  way.  In  addition,  the  main 
components  of  Expliner  can  be  assembled  by  sliding  joints,  which  were  also  in¬ 
corporated  in  the  design  of  the  cases  of  the  actuators. 

Each  motion  unit  is  equipped  with  a  200W  brushless  motor  connected  to  a  train 
of  planetary  gears  with  a  total  reduction  ration  of  60,  maximum  continuous  torque 
of  40.4Nm  and  speed  of  60.6rpm.  With  the  pulleys  of  Expliner,  this  represents  a 
linear  speed  of  between  23m/min  and  29m/min,  depending  on  the  size  of  the 
cables  where  the  machine  is  moving. 

The  vertical  axis  of  each  motion  units  is  powered  by  a  200W  brushless  motor 
with  Harmonic-Drive  reduction  embedded  in  its  case,  resulting  in  a  maximum 
continuous  torque  of  558.4Nm  and  3.6rpm.  The  same  actuator  is  used  to  drive  the 
first  joint  of  the  manipulator  (the  “shoulder”),  while  the  second  joint  (the  “elbow”) 
is  driven  by  a  200W  brushless  motor  with  embedded  Harmonic-Drive,  and  with  a 
maximum  continuous  torque  of  349Nm  and  speed  of  5.7rpm. 

The  dimensioning  of  the  actuators  of  the  manipulator  took  into  account  lifting 
the  counter- weight  when  the  manipulator  is  in  a  vertical  configuration,  something 
that  does  not  happen  in  real  applications,  but  which  may  happen  during  tests. 


Structure 

The  structure  of  Expliner  is  composed  mainly  of  CFRP  pipes,  and  they  were 
designed  to  withstand  the  extreme  postures  required  when  overcoming  obstacles. 

In  order  to  make  Expliner  easy  to  carry  to  the  inspection  sites,  it  was  designed 
to  be  easily  assembled  with  sliding  joints  and  stopping  pins,  as  shown  in  Figure  4. 
In  places  where  the  wiring  might  prevent  an  easy  assembly,  slide-in  connectors 
were  also  employed. 
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Fig.  4  -  Sliding  joint  for  easy  assembly. 


Safety  Hooks 

Another  improvement  from  the  first  prototype  of  Expliner  was  the  introduction 
of  the  safety  hooks,  which  keep  the  robot  attached  to  the  power  lines  even  in  case 
of  sudden  winds  or  accidents.  The  safety  hooks  must  provide  enough  clearance  to 
operate  even  with  large  loading  pipes,  but  must  be  very  close  to  the  pulley  in  its 
engaged  position  to  prevent  the  cable  from  sliding  at  the  gap  between  hook  and 
pulley.  Therefore,  the  safety  hooks  were  assembled  with  a  cam  mechanism  that 
modifies  the  angle  of  the  hook  as  it  moves  from  the  disengaged  position  (open)  to 
the  engaged  position  (closed),  as  shown  in  Figure  5.  Each  safety  hook  is  driven  by 
a  worm-gear  transmission,  so  it  is  self-locking. 

It  is  necessary  to  disengage  the  safety  hooks  when  crossing  obstacles  such  as 
cable  spacers  or  clamp  suspenders,  but  also  when  moving  on  the  loading  pipe. 
Care  is  taken  to  insure  that  at  all  times  at  least  one  safety  hook  is  engaged,  so  that 


Fig.  5  -  Safety  Hook  disengaged  (left)  and  engaged  (right). 
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Expliner  would  not  fall  from  the  cables  in  case  of  malfunctions  or  accidents. 

The  new  version  of  Expliner  is  easier  to  assemble,  easier  to  operate,  and  also 
lighter  that  the  first  prototype  (60kg,  against  85kg  of  the  first  version). 

Control  Architecture 

The  control  architecture  has  been  greatly  improved  since  the  first  prototype 
presented  in  [11].  Each  actuator  has  its  motor  drive  (HiBot  TITech  Drive  Version 
1)  positioned  close  to  it,  in  a  splash-proof  assembly.  Communication  between  the 
motor  drives  and  the  main  micro-controller,  positioned  in  the  counter-weight,  was 
implemented  with  CAN  bus. 

By  installing  the  motor  drives  close  to  the  motors,  wiring  in  the  joints  of  the 
robot  was  greatly  reduced  and  simplified.  In  addition,  the  entire  machine  is 
shielded  for  operation  at  very  high  voltages  (500kV)  and  with  high  electric  cur¬ 
rents  (1400A). 

The  counter-weight  houses  not  only  the  main  micro-controller  (HiBot 
SH2Tiny),  but  also  the  wireless  communication  devices,  which  connect  to  a  port¬ 
able  control  station  by  wireless  LAN.  The  wireless  communication  diagram  is  pre¬ 
sented  in  Figure  6. 


Human-Machine  Interface 

Expliner  is  controlled  from  a  portable  control  case  shown  in  the  lower  right 
comer  of  Figure  6.  From  this  control  case  it  is  possible  to  drive  all  joints  indepen¬ 
dently,  but  this  would  result  in  cumbersome  and  time- demanding  operations. 
Therefore,  the  obstacle-crossing  motions  were  automated,  but  are  always  con¬ 
trolled  by  the  operator,  as  described  next. 

When  crossing  a  clamp  suspender  (Figure  7),  the  operator  controls  the  position 
of  the  center  of  mass  by  moving  a  joystick  forward  or  backward,  while  the  two 
joints  of  the  manipulator  are  driven  automatically,  so  that  the  center  of  mass  is  al¬ 
ways  positioned  between  the  two  upper  cables.  Once  the  front  motion  unit  is  lifted 
with  enough  clearance  from  the  cables,  the  operator  starts  the  next  step  in  the  mo¬ 
tion,  which  consists  of  rotating  the  front  motion  unit  and  bringing  the  pulleys  out 
from  the  cables,  while  at  the  same  time  the  position  of  the  center  of  mass  is  auto¬ 
matically  changed  to  accommodate  for  this  change  in  posture.  These  motions  are 
also  achieved  simply  by  moving  a  joystick  forward  or  backward.  Once  this  is 
completed,  the  operator  moves  the  machine  forward.  When  the  front  unit  has 
passed  the  obstacle,  it  is  brought  back  to  its  initial  angle  by  a  single  joystick  mo¬ 
tion.  Finally,  when  the  front  motion  unit  is  aligned  to  the  cables,  the  center  of 
mass  is  brought  back  to  the  center  position,  which  brings  the  front  pulley  unit  back 
onto  the  cables.  This  process  is  then  repeated  to  lift  the  rear  motion  unit. 

All  joints  are  driven  in  position  control,  with  data  from  encoders.  All  motion 
patterns  and  limits  are  pre-defmed,  in  order  to  make  operation  as  fast  and  safe  as 
possible.  Therefore,  the  joints  of  Expliner  will  move  until  the  defined  angle,  and 
will  always  go  back  to  the  initial  postures  with  an  accuracy  of  0.1°. 
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Fig.  6  -  Wireless  communication  diagram 


Fig.  7  -  Sequence  of  motions  for  overcoming  clamp  suspender  (first  half  of  motion) 


Fig.  8  -  Graphic  Interface  with  data  from  each  actuator  and  other  sensors. 

Similar  automated  patterns  have  been  implemented  for  other  motions,  such  as 
moving  between  the  loading  pipe  and  the  cables,  so  that  the  center  of  mass  is  al¬ 
ways  kept  in  a  safe  position.  This  automated  control  is  especially  useful  when  Ex- 
pliner  is  moving  on  the  single  loading  pipe.  However,  even  with  this  level  of  au¬ 
tomation,  the  presence  of  a  human  operator  is  always  required  due  to  the 
complexity  of  the  environment. 

The  operator  is  always  receiving  telemetry  data  from  the  robot,  including  the 
temperature  of  each  motor,  the  current  consumption,  the  voltage  level,  the  angle  of 
each  joint,  and  the  attitude  of  the  machine,  with  its  spatial  configuration.  The 
graphical  interface  presented  in  Figure  8  makes  the  control  of  the  machine  more 
intuitive  and  easy  to  understand. 


Field  Experiments 

The  latest  prototype  of  Expliner  has  undergone  intense  field  tests  with  very 
promising  results.  Speed  tests  performed  on  horizontal  cables  indicated  an  average 
speed  of  27m/min,  above  the  requirement  of  20m/min.  The  required  speed  of 
20m/min  was  set  based  on  limitations  in  the  sensors  used  to  inspect  the  cables, 
which  will  be  presented  in  the  near  future. 

Expliner  was  also  tested  with  all  cable  spacers  used  in  West  Japan  (Figure  9), 
where  it  is  supposed  to  be  deployed.  Additionally,  Expliner  was  tested  on  inclined 
cables,  with  a  maximum  angle  of  30  degrees,  and  even  in  such  conditions  was  able 
to  perform  the  obstacle  crossing  motion,  as  displayed  in  Figure  9. 

Overcoming  the  clamp  suspender  and  crossing  to  the  other  side  of  a  tower  was 
also  verified  in  field  experiments,  as  shown  in  Figure  10.  The  introduction  of  the 
automated  control  helped  to  reduce  the  time  necessary  to  perform  this  motion, 
from  around  12  minutes  to  approximately  3  minutes  (with  a  trained  operator). 
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Finally,  the  motion  on  the  single  loading  pipe  and  the  transition  motion  be¬ 
tween  loading  pipe  and  cable  were  confirmed  in  real  field  conditions,  as  shown  in 
Figure  11.  All  tests  were  performed  repeatedly  on  test  cables  and  also  on  live 
wires  with  500kV.  In  the  latter  case,  footage  obtained  with  a  ultra-violet  camera 
revealed  the  existence  of  a  corona  around  the  robot,  but  no  malfunction  was  ob¬ 
served,  thus  proving  the  effectiveness  of  the  shielding. 

Conclusions 

With  the  development  of  Expliner,  the  automation  of  inspection  of  high  voltage 
power  lines  became  one  step  closer  to  reality.  The  field  tests  presented  in  this  pa¬ 
per  showed  that  the  concept  of  changing  the  position  of  the  center  of  mass  of  the 
machine  can  be  employed  in  dangerous  and  complex  applications  such  as  power 
lines  suspended  tens  of  meters  above  the  ground.  The  developments  in  human- 
machine  interface  and  the  automation  of  complex  motions  have  made  the  control 
of  the  machine  faster  and  more  reliable.  The  deployment  of  sensors  to  acquire  data 
from  the  power  lines  is  a  future  step,  but  the  authors  are  already  working  on  it,  and 


Fig.  9  -  Cable  spacers  (left)  and  Expliner  crossing  obstacle  on  inclined  cable  (right). 


Fig.  10  -  Expliner  lifting  the  front  motion  unit  (left)  and  rotating  the  motion  unit  after 
crossing  the  clamp  suspender  (right). 


Fig.  11  -  Expliner  moving  on  loading  pipe  (left)  and  performing  an  automated  motion, 
plan  to  present  concrete  results  in  the  near  future. 
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Model  Predictive  Control  for  Mobile 
Robots  with  Actively  Reconfigurable 
Chassis 

P.  Michael  Furlong,  Thomas  M.  Howard,  and  David  Wettergreen 


Abstract  Actively  reconfigurable  chassis  enable  planetary  mobile  robots  to 
access  more  varieties  of  terrain.  While  typical  approaches  for  exploiting  such 
mechanisms  reply  on  feedback  control,  it  is  beneficial  to  consider  actively 
controlled  elements  at  planning  time  rather  than  during  motion  execution. 
In  this  paper  we  present  an  approach  for  extending  work  in  model-predictive 
trajectory  generation  to  actively  reconfigurable  chassis.  The  motion  planner 
uses  a  kinematic  motion  model  and  a  terrain  shape  model  to  determine  se¬ 
quences  of  actions  that  minimize  a  cost  function  over  vehicle  attitude  by 
modifying  the  shape  of  the  velocity,  curvature,  and  chassis  configuration  pro¬ 
files.  Simulation  and  field  results  are  presented  demonstrating  the  benefits  of 
this  technique  on  a  prototype  mobile  robot  for  lunar  excavation. 


1  Introduction 

As  exploration  of  our  universe  continues,  it  becomes  necessary  to  navigate  in 
challenging,  cluttered  terrain  to  examine  geologic  records,  search  for  records 
of  microorganisms,  and  potentially  discover  life.  In  such  environments,  ob¬ 
stacles  cannot  entirely  be  avoided,  they  must  be  traversed  in  a  manner  that 
minimizes  the  risk  to  the  platform.  Adjustable  chassis  improve  the  mobile 
robots  ability  to  overcome  harsh  terrain  by  shifting  the  center  of  gravity  to 
maximize  stability. 

Most  modern  platforms  with  actively  reconfigurable  chassis  adjust  their 
configuration  via  feedback  control,  maximizing  a  stability  criterion  on  the 
current  state  of  the  vehicle.  A  potentially  better  solution  is  to  generate  tra¬ 
jectories  using  a  predictive  model  of  vehicle  motion  or  stability  that  includes 
the  active  chassis  freedoms. 
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A  potentially  better,  safer  solution  is  to  generate  inputs  and  trajectories 
based  on  a  prediction  of  vehicle  motion  and  configuration  articulation  through 
the  environment.  Deliberative  planning  allows  a  greater  understanding  of 
traversability  and  enables  improved  route  selection  decisions. 


1.1  Related  Work 

Related  work  spans  research  on  exploiting  vehicles  with  reconfigurable  chassis 
and  research  on  the  generation  of  trajectories  for  a  robot  that  optimizes  some 
cost  function. 

Iagnemma  et  al.  present  an  optimization  technique  for  externally  recon¬ 
figurable  four-wheeled  vehicles  that  maximizes  stability  over  rough  terrain 

[5] .  Schenker  et  al.  incorporates  that  technique  into  a  system  for  planning 
vehicle  configurations  at  different  points  in  the  trajectory  [10].  Ishigami  et  al. 

[6]  modifies  the  technique  in  [4]  to  incorporate  models  of  wheel  slip  to  deter¬ 
mine  minimal  slip  paths  that  exploit  the  steering  capabilities  of  the  robot. 
Nakamura  et  al.  [9]  show  that  repositioning  the  center  of  mass  of  the  vehicle 
improves  the  static  stability  while  reducing  energy  consumption  of  the  vehicle 
during  traverse. 

The  work  of  Howard  and  Kelly  [4]  employs  optimization  procedures  to 
produce  trajectories  that  minimize  terminal  state  position  error  and  generates 
trajectories  that  account  for  predictive  motion  models  of  the  vehicle  as  well  as 
models  of  the  terrain  the  vehicle  has  to  cross.  Farritor,  Hacot  and  Dubowksy 
[2]  use  a  genetic  algorithm  to  combine  pre-scripted  action  modules  to  produce 
plans  that  are  then  evaluated  for  feasibility  and  vehicle  safety.  While  the 
approach  does  account  for  vehicle  physics  it  deals  with  actions  sampled  from 
the  configuration  space  of  the  vehicle  and  not  the  continuum  of  the  command 
profile  parameterization. 

Mobile  robot  stability  is  addressed  by  Diaz-Calderon  and  Kelly  [1],  Schenker 
et  al.  [10],  and  Iagnemma  et  al.  [5]  by  examining  the  relationship  between 
the  center  of  mass  of  the  vehicle  and  the  edges  of  the  polygon  of  support. 
Messuri  and  Klein  [8]  as  well  as  Hirose,  Tsuagoshi,  and  Yoneda  [3]  determine 
the  stability  of  the  vehicle  as  a  function  of  the  energy  require  to  induce  a  tip 
over. 


1.2  Discriminators 

While  the  results  of  this  work  are  similar  to  the  control  approaches  developed 
by  [2,  6],  it  is  distinct  in  two  ways.  First,  the  sidearm  angle  command  profiles 
are  produced  through  continuum  optimization  rather  than  the  action-library 
based  approach  of  [2].  As  such  the  resulting  trajectory  is  inherently  feasible 
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as  the  presented  technique  optimizes  the  inputs  to  the  control  system  in¬ 
stead  of  sampled  actions  of  the  vehicle.  Our  approach  differs  from  [6]  in  that 
it  searches  the  local  continuum  and  optimizes  a  criterion  based  on  vehicle 
stability  instead  of  minimizing  slip. 

2  Technical  Approach 

The  trajectory  optimization  technique  presented  in  this  paper  is  an  exten¬ 
sion  of  model-predictive  trajectory  generation  for  mobile  robots  with  actively 
reconfigurable  chassis.  This  section  presents  an  overview  of  the  posture  opti¬ 
mization  trajectory  generation  technique,  a  discussion  of  the  predictive  mo¬ 
tion  model  and  the  cost  functions  applied  in  this  work. 

2.1  Active  Posturing 

Actively  reconfigurable  chassis  enable  mobile  robots  to  adjust  their  center 
of  mass  to  improve  traction  and  achieve  a  more  stable  configuration.  Mobile 
robots  such  as  Scarab  [12]  and  the  Sample  Return  Rover  [7]  are  able  to  inde¬ 
pendently  adjust  the  independent  chassis  wheel  bases  by  setting  the  sidearm 
angle.  Actively  adjustable  wheel  bases  enable  posturing  of  the  robot  body  in 
a  ways  that  are  not  available  to  fixed  chassis  robots  (Figure  1). 


A  A-y- 
w  ^  w  % 

(a)  Raised  Chassis  (b)  Nominal  Configuration  (c)  Lowered  Chassis 

(d)  Chassis  Lean  Left  (e)  Chassis  Lean  Right 

Fig.  1:  Articulation  capabilities  of  the  Scarab  mobile  robot.  Symmetrical  values  for 
the  sidearm  chassis  angles  enables  the  body  to  raised  (a)  to  avoid  small  obstacles  or 
lowered  (c)  to  reduce  the  risk  of  tip  over.  Asymmetrical  sidearm  chassis  angles  enables 
the  body  to  be  postured  left  (d)  and  right  (e)  to  achieve  more  stable  configurations 
on  rocks  and  slopes. 
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2.2  Model  Predictive  Trajectory  Generation 

Typical  motion  planning  or  navigation  tasks  simply  require  that  a  mobile 
robot  move  from  one  pose  to  another.  Often  there  are  no  constraints  imposed 
on  the  internal  configuration  of  the  vehicle  over  the  course  of  the  trajectory. 
Given  a  potentially  infinite  number  of  chassis  configurations,  the  problem 
becomes  continuum  search  for  the  optimal  chassis  configuration  along  the 
trajectory  that  satisfies  the  boundary  state  constraints. 

The  presented  approach  is  an  extension  of  model-predictive  trajectory  gen¬ 
eration  that  optimizes  vehicle  posture  in  addition  to  satisfying  boundary  state 
constraints.  It  leverages  parameterized  profiles  to  describe  continuous  inputs 
in  order  to  reduce  the  scope  of  the  continuum  search.  For  a  mobile  robot  with 
an  actively  reconfigurable  chassis  as  shown  in  Figure  1,  inputs  can  be  defined 
for  the  curvature  (ft(p,x, £)),  linear  velocity  (u(p,x, £)),  and  sidearm  angles 
(^e/t(p,  x,  £),  (Tright{p,  x,  £)),  where  p  is  the  parameterization  describing  the 
shape  of  the  inputs. 


2.2.1  Tiered  Optimization 

An  overview  of  the  tiered  optimization  technique  is  shown  in  Figure  2.  The 
boundary  state  constraints  are  satisfied  in  the  inner  loop  by  correcting  the 
curvature  and  velocity  command  profile  parameters  using  a  model-predictive 
trajectory  generator.  The  posture  optimization  is  governed  by  the  outer  loop, 
which  modifies  the  commanded  sidearm  angles  based  on  the  accumulated 
attitude  of  the  generated  trajectory. 


XI ,  Xf,  ft(p,  X,  t),vx(p,  X,  t) 


x(t) 


Fig.  2:  The  above  diagram  illustrates  tiered  optimization  for  trajectory  generation 
with  posture  optimization.  The  inner  layer  generates  trajectories  that  satisfy  bound¬ 
ary  state  constraints  with  the  posture  parameters  determined  by  the  outer  layer. 
The  outer  layer  optimizes  the  posture  parameters  by  maximizing  the  stability  of  the 
vehicle  over  the  candidate  trajectory  x(t) 
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The  outer  layer  of  the  optimization  determines  parameters  satisfying  the 
relation: 


X  —  / (x?  ft,  U ,  Cleft-,  bright)  (1) 

p  =  a,rgmmG(^2f(xt-1,K,v,aieft(p,x,t),aright(p,x,t))At)  (2) 

P  t 

The  cost  functional  G  scores  the  execution  of  the  controls  parameterized 
by  p.  Gradient  descent  is  used  to  determine  the  minimum  cost  setting  of 
the  chassis  sidearm  control  parameters.  The  inner  layer  of  the  optimization 
is  the  constrained  form  of  the  model-predictive  trajectory  generator  in  [4] 
to  produce  continuous  trajectories  which  consider  the  shape  of  the  sidearm 
control  profiles.  In  this  formulation,  the  inputs  associated  with  steering  and 
velocity  along  the  trajectory  are  not  assumed  to  be  independent  of  those 
governing  the  configuration  of  the  chassis.  As  the  configuration  iteratively 
adjusts,  traction  can  improve,  motion  over  rough  terrain  can  alter,  and  the 
inputs  required  to  satisfy  the  boundary  state  constraints  may  change. 


2.2.2  Predictive  Motion  Model 

The  predictive  motion  model  is  used  to  estimate  the  vehicle  state  response 
to  the  control  inputs.  One  such  example  is  a  predictive  motion  model  based 
on  the  Scarab  mobile  robot  used  for  the  experiments  described  in  Section  3. 
It  has  a  mass  of  approximately  300kg,  has  a  maximum  speed  of  0.04  m/s, 
and  is  assumed  to  be  quasi-static.  The  kinematic  model  is  an  adaptation  of 
the  work  for  a  planetary  rover  [11]  that  represents  Scarab  through  a  series 
of  Denavitt-Hartenberg  Parameters.  To  represent  the  configuration  of  the 
chassis,  the  passive  major  chassis  angle  (/ 3 )  and  the  angles  of  the  chassis  side 
arms  {(Jie  ft  bright)  are  included  in  the  vehicle  state  (x).  The  passive  chassis 
angle,  roll,  pitch,  and  elevation  were  determined  as  the  vehicle  moved  through 
the  environment  using  the  technique  described  in  [11]. 


2.2.3  Cost  Function 

The  outer  optimization  layer  for  the  chassis  sidearm  angles  maximizes  the 
vehicle  stability.  A  standard  tool  for  kinematically  representing  vehicle  sta¬ 
bility  is  the  stability  polygon.  The  stability  polygon  is  the  convex  hull  of  the 
contact  points  of  the  vehicle  with  the  ground.  The  cost  function  optimized 
for  vehicle  stability  is  the  sum  over  the  trajectory  of  the  vehicle  of  the  dis¬ 
tance  between  the  center  of  the  stability  polygon  and  the  intersection  of  the 
projected  center  of  mass  with  the  stability  polygon  (Figure  3). 
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gravity  vector 


stability  polygon 


Fig.  3:  The  stability  polygon  of  the  vehicle  with  the  vehicle  center  of  mass  projected 
onto  the  convex  hull  of  the  contact  points  of  the  wheels  with  the  ground.  The  cost 
function  used  in  the  active  chassis  optimization  minimizes  the  distance  between  the 
intersection  of  the  gravity  vector  and  the  center  of  the  stability  polygon. 

As  the  projection  of  the  center  of  mass  of  the  vehicle  gets  closer  to  the 
edges  of  the  stability  polygon  the  vehicle  comes  closer  to  tipping  over.  If  the 
centre  of  mass  is  outside  the  polygon  then  the  gravity  vector  pulls  the  vehi¬ 
cle  away  from  its  contact  points  inducing  tip  over.  Minimizing  the  distance 
from  the  centre  of  the  support  polygon,  which  maximizes  the  perpendicular 
distance  to  the  edges,  increases  the  stability  of  the  vehicle.  Maximizing  the 
distance  of  the  centre  of  mass  from  the  edges  of  the  support  polygon  is  pro¬ 
portional  to  maximizing  the  normalized  energy  stability  criterion  as  described 
in  [3] .  Because  of  the  computational  simplicity  that  the  distance-to-edge  cost 
function  was  chosen  over  energy  margin  based  cost  functions. 


3  Experiments  and  Results 

The  experiments  in  this  paper  were  designed  to  demonstrate  the  ability  for 
the  tiered  optimization  technique  to  improve  the  stability  margin  for  the 
mobile  robot  operating  in  natural  environments.  The  experiments  were  con¬ 
ducted  first  in  simulation  and  then  on  the  robotic  platform  described  below. 

The  target  platform  for  these  experiments  was  Scarab  [12],  a  four-wheeled 
planetary  rover  with  an  actively  reconfigurable  chassis  built  at  the  Robotics 
Institute’s  Field  Robotics  Center  (Figure  4).  For  the  simulation  and  field 
experiments,  the  freedoms  of  each  active  chassis  elements  were  constrained 
such  that  the  wheelbase  was  between  0.94m  and  1.20m. 
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Fig.  4:  Scarab,  a  four-wheeled  prototype  lunar  prospector  with  an  independent  artic¬ 
ulating  chassis.  In  this  configuration,  the  robot  has  exploited  the  articulating  chassis 
to  minimize  roll  of  the  body,  thereby  limiting  exposure  to  tip  over. 

3.1  Simulation  Experiments 

In  order  to  test  the  ability  for  the  posture  optimization  to  maximize  stability 
over  the  perceived  environment  a  hill  was  chosen  for  this  experiment  as  it 
most  resembles  the  terrain  that  is  expected  to  be  encountered  during  crater 
wall  navigation.  Unlike  the  field  experiment  the  trajectory  generator  was 
given  an  omniscient  view  of  the  terrain. 

For  consistency  with  the  latter  field  experiments,  the  absolute  roll  of  the 
vehicle  is  used  instead  of  the  stability  margin.  This  is  due  to  the  lack  of 
sensing  of  the  major  bogie  angle  on  the  target  platform,  which  prevents  direct 
computation  of  the  stability  criterion. 

Figure  5  shows  that  the  reconfigurable  chassis  provides  a  trajectory  with 
roll  that  is  consistently  lower  than  that  of  the  fixed  chassis.  On  average  the 
roll  of  the  vehicle  with  the  active  chassis  is  twenty-eight  percent  of  the  roll  of 
the  vehicle  with  the  fixed  chassis.  Figure  5  shows  the  settings  of  the  chassis 
sidearm  angles  as  the  cost  of  the  trajectory  is  reduced.  Over  one  hundred 
iterations  were  required  for  the  sidearm  angle  parameters  to  converge,  this 
can  be  reduced  by  increasing  the  step  size  of  the  gradient  descent  algorithm 
used  to  optimize  the  parameters. 


3.2  Field  Experiments 

The  field  experiments  were  conducted  in  a  natural  terrain  similar  to  the  previ¬ 
ously  described  simulation  experiments.  Instead  of  assuming  perfect  knowl¬ 
edge  of  the  terrain  shape,  an  active  range  sensing  perception  system  was 
used.  A  trimesh  representing  the  terrain  shape  was  estimated  by  processing 


8  P.M.  Furlong,  T.M.  Howard,  D.  Wettergreen 


(a) 


Chassis  Sidearm  Angle  vs.  Number  of  Optimization  Iterations 


(b) 


Fig.  5:  The  cost  of  the  simulated  run  of  the  vehicle.  The  absolute  roll  experienced 
by  Scarab  with  and  without  posture  optimization  is  show  in  (a).  The  changes  to  the 
sidearm  angle  as  a  function  of  optimization  iteration  are  shown  in  (b). 

the  observed  point  cloud.  The  triangulated  meshes  were  used  for  planning 
trajectories  and  simulating  executions  of  planned  actions.  For  this  experi¬ 
ment,  the  predictive  motion  model  was  assumed  to  be  entirely  kinematic, 
although  generally  any  model  of  dynamics  or  wheel/terrain  interaction  could 
be  applied. 

As  in  the  previous  experiment,  vehicle  roll  was  taken  as  an  indicator  of 
stability.  In  this  experiment  the  robot  was  position  on  a  slope  and  two  trajec¬ 
tories  were  planned  and  executed.  The  first  trajectory  assumed  a  fixed,  nom¬ 
inal  chassis  configuration.  The  second  trajectory  optimized  the  input  profiles 
governing  the  vehicle  posture.  Each  motion  plan  execution  originated  from 
the  same  initial  state,  as  verified  by  the  onboard  inertial  navigation  system. 

Figure  6  present  the  vehicle  roll  for  fixed  and  adaptive  chassis  while  driving 
over  the  varied  terrain  over  two  different  courses.  On  average,  the  predictively 
controlled  chassis  reduces  the  roll  of  the  vehicle  body  while  traversing  the 
sloped  terrain. 
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Fig.  6:  The  roll  of  the  vehicle  while  driving  up  a  slope.  The  orange  line  represents 
the  roll  of  the  vehicle  using  the  active  chassis  while  the  blue  line  represents  the  roll 
of  the  vehicle  using  a  fixed  chassis. 


4  Conclusion 

The  presented  technique  optimizes  wheelbase  inputs  for  a  mobile  robot  with 
an  actively  reconfigurable  chassis  to  improve  its  stability.  Increased  stability  is 
important  on  slopes  or  when  traversing  rough  or  uneven  terrain.  By  reasoning 
deliberatively  about  actions  taken,  the  reachability  of  stable  configurations 
and  the  consequences  of  these  actions  can  be  determined.  Separating  the 
trajectory  generation  and  posture  optimization  with  a  tiered  approach  can 
efficiently  determine  solutions  which  satisfy  the  two-point  boundary  value 
problem  while  optimizing  an  underconstrained  portion  of  the  system.  While 
this  technique  produces  only  locally  optimal  solutions  in  the  continuum  the 
low-order  parameterized  functions  resist  problems  of  numerous  local  optima. 

There  are  several  logical  extensions  to  the  work  presented  in  this  paper. 
Uncertainty  in  the  terrain  model  must  be  addressed  and  it  is  recommended 
that  the  trajectory  generator  be  coupled  with  a  online  system  that  adjusts 
the  vehicle’s  configuration  to  maximize  instantaneous  stability.  Next  is  to 
integrate  a  higher  fidelity  predictive  motion  model  that  simulates  the  effect 
of  wheel/terrain  interaction  and  a  dynamic  (rather  than  kinematic)  vehicle 
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model.  The  application  of  actively  reconfigurable  chassis  trajectory  optimiza¬ 
tion  to  sampling-based  navigators  is  important  because  it  obviates  the  need 
to  sample  the  space  of  chassis  sidearm  inputs.  Lastly,  this  technique  can  be 
used  in  a  motion  planning  structure  to  generate  edges  that  consider  the  shape 
of  the  terrain  and  the  configuration  of  the  active  chassis. 
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Hirning  Efficiency  Prediction  for  Skid  Steer 
Robots  Using  Single  Wheel  Testing. 

Daniel  Flippo,  Richard  Heller  and  David  R  Miller 


Abstract  To  date,  most  field  robots  use  wheels  as  their  means  of  locomotion  (es¬ 
pecially  true  of  planetary  exploration  robots).  In  many  cases  these  robots  are  re¬ 
quired  to  travel  significant  distances,  with  limited  power,  and  over  rough  terrain.  All 
of  which  make  wheels  a  major  component  contributing  to  their  performance.  It  is 
through  experimentation  and  iteration  that  effective  wheel  design,  for  a  given  rover 
in  a  given  mission,  can  be  achieved.  To  do  this,  the  SWEET  (Suspension  and  Wheel 
Evaluation  and  Experimentation  Testbed)  simulates  the  rover  environment  using  a 
single  wheel  methodology.  The  wheels  currently  being  tested  belong  to  the  SR2 
skid  steer  Mars  rover  designed  and  built  at  the  University  of  Oklahoma.  Simulating 
a  skid  steer  turn  with  SWEET  is  achieved  by  varying  the  spinning  rate  of  the  plat¬ 
form  under  the  wheel,  which  is  rotating  at  a  certain  rate,  and  recording  the  forces 
incurred.  These  forces  interact  in  such  a  way  that  the  relevant  mobility  properties  for 
a  rover  can  be  predicted.  This  experimentation  method  allows  for  cheap  and  timely 
iterative  single  wheel  design. 


1  Introduction 


Compared  with  automotive  wheels  very  little  research  has  been  done  in  the  area 
of  interplanetary  wheel  design.  To  fill  the  gap  in  the  understanding  of  rover  wheel 
design  and  wheel  to  soil  interaction,  testing  machines  have  been  designed  by  various 
institutions.  In  1971,  NASA  tested  a  single  Lunar  Rover  Vehicle  wheel  on  a  testing 
device  called  a  dynamometer  system  [6]  and  now  uses  devices  such  as  the  variable 
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terrain  tilt  platform  (VTTP),  at  JPL,  to  gain  a  better  understanding  of  entire  rover 
systems  in  a  sloped  environment.  The  VTTP  is  a  16  x  16  ft  table  that  can  tilt  up  to  25 
degrees  and  can  be  left  bare  or  covered  with  terrain  [4]  but  is  meant  to  incorporate 
the  total  rover  assembly  and  is  used  in  a  design  validation  role  rather  than  an  iterative 
design  role.  At  the  Massachusetts  Institute  of  Technology  a  testing  device  (FSRL) 
tests  a  single  driven  wheel  through  different  mediums  to  better  understand  wheel 
to  soil  interaction  [3].  A  similar  device  is  used  at  Tohoku  University  to  refine  rover 
steering  and  other  parameters  [9].  Other  comparable  devices  test  wheels  for  Earth’s 
terrain  are  [5,  8,  2].  The  University  of  Oklahoma’s  testing  apparatus  named  SWEET 
[1]  is  unique  in  that  it  allows  for  true  turn  testing. 

All  these  test  beds  allow  the  simulation  of  aspects  of  real-life  operations.  Full 
assembly  test  beds  are  more  difficult  and  expensive  to  use  since  they  require  the  full 
rover,  a  full  compliment  of  wheels,  and  much  more  space.  Issues  with  the  wheel 
design  may  also  be  conflated  with  other  aspects  of  the  rover  design  when  a  full 
system  is  tested,  making  iterative  improvement  of  the  wheel  more  difficult.  Single 
wheel  testing  machines,  on  the  other  hand,  allow  a  designer  to  iteratively  design 
a  wheel  in  a  much  less  costly  and  timelier  fashion  than  full  assembly  testing.  For 
these  single  wheel  testing  machines  to  be  of  any  use,  the  data  that  they  give  must 
have  some  significance  in  the  real  world.  Their  performance  in  the  single  wheel 
testing  machine  must  transfer  to  predict  the  behavior  exhibited  on  a  multi-wheel 
rover  doing  typical  maneuvers  in  field  conditions. 

Skid  steering  turn  performance  is  an  example  of  a  typical  maneuver,  beyond  the 
domain  of  most  single  wheel  test  systems.  If  it  can  be  demonstrated  that  a  model  can 
transform  data  from  a  single  wheel  test  to  predict  the  turning  efficiency  of  a  rover, 
then  skid  steer  turning  is  one  more  behavior  that  can  be  studied  and  improved  upon 
cheaply  and  thoroughly  using  the  single  wheel  testing  method.  This  paper  describes 
a  method  to  test  skid  steer  rover  wheels  on  a  single  wheel  test  apparatus  and  then 
predict  its  real  world  turning  performance  on  a  skid  steer  rover.  The  predictions  are 
then  compared  to  full  assembly  tests  fitted  with  four  identical  wheels  to  the  one 
tested. 


2  Theory  of  Single  Wheel  to  Full  Assembly  Correlation 

Skid  steering  is  an  unintuitive  process  in  that  there  are  multiple  forces,  due  to  the 
lateral  sliding,  that  must  take  place  for  a  skid  steer  rover  to  turn.  When  a  rover 
initiates  a  turn  its  rotation  (in  the  X  —  Y  plane)  will  accelerate  up  to  a  certain  spin 
rate  Q  (Fig.  1)  at  which  point  it  will  stabilize  and  the  moment  about  its  center  ( M0 ) 
will  equal  zero. 


ZM0  =  0  . 

Z(/yRcos(®))  —  Z(/y?sin(0))  =  0  . 
Fy  =  Fx  tan(0)  . 


(1) 

(2) 

(3) 
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Fig.  1  Skid  Steer  Force  Body  Diagram 


Equation  3  describes  a  relationship  between  Fx  and  Fy  at  the  turning  equilibrium 
point  and  is  dependent  upon  the  rover  geometry  (0).  If  the  rover  were  slender  (Fig. 
2-a)  then  0  would  be  larger  than  |  and  Fx  would  be  much  smaller  than  Fy.  If  0  = 
|  then  Fy  =  oo.  This  would  mean  that  no  matter  how  much  force  a  wheel  could 
exert  on  the  ground  the  rover’s  spin  rate  Q  would  always  be  zero.  If,  on  the  other 
hand,  0  were  equal  to  zero,  as  in  Fig.  2-b,  then  Fy  (which  is  really  the  net  force  of 
power  and  friction)  would  be  equal  to  zero.  This  configuration  is  better  known  as 
Ackerman  steering  which  means  that  the  wheels  have  no  lateral  slip  and  if  there  is 
no  longitudinal  slip  then  the  turning  rate  can  be  calculated  by  eq.  4. 

cor 

C2  =  —,Fy  =  0.  (4) 

where  (0  is  the  wheel  angular  velocity  in  radians  per  second,  r  is  the  wheel  radius, 
and  R  is  the  distance  from  the  center  of  the  wheel  to  the  center  of  rotation  of  the 
rover. 

Equation  4  refers  to  the  ideal  turning  rate  Qideal  without  longitudinal  slipping 
for  an  Ackerman  steering  geometry.  To  calculate  Qideal  for  a  skid  steer  rover  (0  ^ 
0),  0  must  be  taken  into  account  and  is  reflected  in  eq.  5.  Qideal  refers  to  the 
theoretical  maximum  a  skid  steer  rover  can  spin,  but  Fy,  at  Qideal,  is  still  not  zero. 

(Or 

&IDEAL  =  -r-COS (0),Fy  7^  0  .  (5) 

K 

To  find  the  value  of  &Fy= o,  which  is  the  spin  rate  at  which  there  is  no  longer  a  net 
force  in  the  Y  direction,  the  longitudinal  velocity  (VyXFig.  2b)  of  the  ground  under 
the  wheel  must  be  equal  to  the  velocity  of  the  wheel  rim  (cor)  therefore  making 
Fy  =  0  (no  slip).  Equation  10  explains  this  relationship.  Loose  soils  that  cause  more 
viscous  friction,  such  as  sand,  will  alter  the  slope  of  the  force  curves  and  decreasing 
the  Q Predicted  and  QFy=o  values. 


Vy  =  (Or  . 


(6) 
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(a) 


Fig.  2  a)  Skid  Steer  Geometry  Configurations;  b)  Skid  Steer  Kinematics 


—  COS {®^)V ground  • 


Vground  —  £2R 


cor  =  QR cos(0)  . 

^  cor  cor 

Fy=0  ~  7?cos(@)  “  ~RX  ' 


(7) 

(8) 
(9) 

00) 


Fig.  3  Force  vs  Spin  Rate  Example 


For  the  right  front  wheel  of  a  rover  pivoting  in  the  counter  clockwise  direction, 
the  ground  must  move  under  it  in  the  opposite  direction  and  the  relation¬ 

ship  of  the  forces  on  the  wheel,  as  the  spin  rate  (12)  of  the  ground  under  the  wheel 
increases,  can  be  shown  in  illustration  3.  When  the  simulated  rover’s  spin  rate  (Q) 
is  equal  to  zero  the  wheel  being  tested  rotates  ( co )  but  does  not  move.  This  causes 
a  force  in  the  Y  direction  which  is  just  the  kinetic  friction  (Fy  =  flgN)  between  the 
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wheel  and  ground.  For  a  blank  wheel  on  smooth  ground  there  is  no  Fx  at  F2  =  0, 
but  for  a  treaded  wheel  Fx  could  be  non-zero  which  will  be  one  value  to  focus  on 
when  testing  new  wheels.  As  the  spin  rate  of  the  ground  under  the  wheel,  increases 
Fx  increases  while  Fy  decreases  until  they  intersect.  This  meeting  point  would  rep¬ 
resent  the  equilibrium  spin  rate  (Q predicted )  of  a  square  rover  (0  =  |).  To  find  the 
equilibrium  point,  of  a  rectangular  rover,  eq.  3  adds  the  needed  constraint  between 
Fx  and  Fy.  For  the  SR2  [7]  rover  0  =  .8477  rad  when  combined  with  eq.  3  simplifies 
to  eq.  11. 


Fy  =  1.1337^  .  (11) 

In  essence  what  we  are  doing  is  operating  the  wheel  and  the  ground  under  the 
wheel  independently,  by  varying  the  ground  speed  (Q)  while  keeping  the  wheel 
spin  rate  ( (0 )  constant,  and  observing  the  behavior  of  the  forces  acting  on  the  wheel. 
When  the  forces  satisfy  eq.  1 1  the  corresponding  £2  is  the  predicted  rover  spin  rate. 
In  Fig.  3  this  relationship  gives  a  point  just  right  of  the  cross  point  and  corresponds 
to  a  Q predicted  value  which  is  the  predicted  spin  rate  of  a  rover  fitted  with  four  wheels 
with  the  same  orientation,  relative  to  the  rover  center,  and  identical  tread  to  the  wheel 
tested. 

It  should  be  noted  how  a  rover’s  geometry  affects  this  relationship.  As  0  in¬ 
creases  above  ^  the  rover  is  more  slender  (Fig.  2)  which  makes  turns  less  efficient 
and  ^predicted  becomes  smaller.  If,  on  the  other  hand,  0  decreases  its  Q predicted 
value  increases  until  0  =  0  and  Q predicted  =  which  is  an  Ackerman  steering  ge¬ 
ometry. 


3  Validation  Experiments 

To  do  single  wheel  testing  the  Suspension  and  Wheel  Experimentation  and  Evalu¬ 
ation  Testbed  (SWEET)  is  used.  The  testbed  (Fig.  4)  has  a  10  x  10  ft  footprint  and 
a  weighted  drop  down  test  leg,  incorporating  a  driven  wheel  and  a  six-axis,  force 
torque  sensor  which  stays  stationary  in  the  X  and  Y  directions  but  allows  movement 
along  the  Z-axis  via  a  counterbalance  system. 

SWEET  differs  from  most  testbeds  in  that  the  table  can  move  in  the  X  and  Y 
directions  underneath  the  test  stand,  as  well  as  rotate  in  the  X,  7-plane.  This  added 
advantage  gives  the  apparatus  the  unique  ability  to  measure  forces  and  torques  in  a 
true  turn  allowing  the  study  of  skid  steer  turning. 

SWEET  was  programmed  to  simulate  a  skid  steer  turn  and  fitted  with  a  .109 
meter  diameter  blank  wheel  on  simple  carpet  (Fig.  4).  Parameters  were  set  to  mimic 
our  in-house  four  wheel  skid  steer  rover’s  (SR2  [7])  geometry  and  loading.  The  test 
variables  were  wheel  spin  rates  (co  =  .3,  .4,  and  .5  ^)  and  turn  rates  (Q  =  0,  -.01, 
-.02 . -.12  ^)  with  5  trials  of  each.  Post  processing,  of  the  data,  was  done  with 
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Fig.  4  a)  SWEET  Testbed;  b)  SR2  rover  spin  rate  testing 


several  C  programs  that  averaged  all  trials,  performed  2nd  and  3rd  order  regression 
curve  fitting,  and  calculated  Qsweet Predicted- 

The  results,  for  SWEET’s  skid  steer  turn  test,  are  shown  in  Figs.  5,6,  and  7. 


SR2  (Fig.  4)  was  then  fitted  with  four  blank  wheels  and  turned  on  the  same  carpet 
to  validate  the  results.  Tests  were  done  for  three  different  wheel  speeds  (a)  =.3,  .4, 
and  .5  measuring  the  spin  rate  of  the  rover  during  the  test  (by  measuring  the 
angle  between  an  onboard  laser  level  mark  and  the  initial  position  and  dividing  by 
the  elapsed  time),  which  are  given  in  table  1  along  with  Qsweet Predicted  and  per¬ 
centage  error.  These  results  show  a  definite  validation  of  the  SWEET  single  wheel 
test  within  3%. 
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Fig.  7  Results  for  co  =  .5  ^ 


Table  1  QSR2  and  G-sweet  Predicted  results  in  ^ 


CO 

&SR2 

&  SWEET  Predicted 

Error 

0.3 

-.042 

-.042 

0% 

0.4 

-.056 

-.057 

1.8% 

0.5 

-.066 

-.064 

3.0% 

4  Skid  Steer  Experiments  with  Non-Blank  Wheels 


In  considering  a  non-blank  wheel,  particularly  a  directional  patterned  wheel  such  as 
Fig.  8  there  is  a  possibility  of  a  force  along  the  X  axis  induced  by  the  tread  pattern. 
If  the  wheel  is  mounted  on  the  correct  side  then  the  additional  force  will  benefit 
the  turning  efficiency  by  offsetting  the  frictional  force  produced  by  the  turn.  The 
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theoretical  ideal  turning  rate  for  a  directional  treaded  wheel  has  to  include  any  Vx 
produced  by  the  tread. 


Fig.  8  a)  Kinematic  explanation  of  treaded  wheel;  b)Measuring  a  on  a  treaded  wheel 


Vt  =Vycos(@)  +Vxsin(0)  . 

(12) 

Vy  =  cor . 

(13) 

Vt  =  QR. 

(14) 

Qideal  =  4(©rcos(0)  +  Vxsin(0)) . 

K 

(15) 

if  Qideal  were  related  to  the  tread  design  only  (such  as  a  bolt  screwing  into  a 
nut)  and  ignored  any  soil  interaction  Vx  would  be  a  function  of  0),  a,  and  r  (equation 
16  and  Fig.  8).  Which  would  give  the  Qideal  in  equation  17. 


(16) 

(17) 


Two  directional  patterned  wheels,  with  diameter  of  .102  meters  (Fig.  8b),  were 
tested  in  SWEET.  Figures  9  and  10  show  the  performance  of  the  two  oppositely 
patterned  wheels  dubbed  ’left’  and  ’right’  which  correspond  to  their  proper  orienta¬ 
tion  on  the  rover.  Again  the  tests  were  run  simulating  the  right  front  side  of  a  rover 
turning  in  a  counter-clock- wise  fashion.  The  tests  were  run  on  padded  carpet,  and 
not  a  hard  surface,  to  focus  on  how  the  tread  itself  interacts  with  the  surface  and  the 
treads  affect  on  turning  performance.  Figure  10  shows  the  results  of  a  left  wheel  in 
that  position  producing  a  £2  sweet  Predicted  value  of  -.037  ^  while  the  right  wheel 
gives  a  Q predicted  value  of  -.050  ^  (table  2).  The  left  wheel  can  be  visualized  as 
trying  to  screw  itself  to  the  right  fighting  against  the  turn  when  placed  on  the  right 
side,  the  right  wheel  is  trying  to  screw  itself  left  benefiting  the  turn. 
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Fig.  10  Results  for  left  treaded  wheel  rotating  at  (O  =  .3  ^  in  the  right  front  position 
Table  2  flsR2  and  Q. predicted,  results  for  treaded  wheels 


Wheel 

&SR2 

^ Predicted 

Error 

Left 

-.0367 

-.0374 

1.9% 

Right 

-.0454 

-.0496 

9.25% 
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5  Conclusions 

This  paper  discusses  and  demonstrates  a  method  that  allows  the  results  from  a  single 
wheel  test  to  be  used  to  predict  turning  efficiency  for  a  full  assembly  skid  steer 
rover.  Three  different  wheels  were  tested  and  predicted  turn  rates  were  within  10% 
of  full  assembly  tests  which  probably  can  be  refined  by  increasing  the  sample  size. 
Future  work  will  be  to  test  on  sand  and  other  terrain,  test  and  evaluate  interesting 
wheel  types,  and  iterate  tread  design  on  conventional  wheels  to  better  ascertain  a 
wheel’s  performance  on  different  media  all  of  which  without  the  cost  and  time  of 
full  assembly  tests. 
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Field  Experiments  in  Mobility  and  Navigation 
with  a  Lunar  Rover  Prototype 


David  Wettergreen1,  Dominic  Jonak,  David  Kohanbash,  Scott  Moreland, 
Spencer  Spiker,  and  James  Teza 


Abstract  Scarab  is  a  prototype  rover  for  lunar  missions  to  survey  resources,  par¬ 
ticularly  water  ice,  in  polar  craters.  It  is  designed  as  a  prospector  that  would  use  a 
deep  coring  drill  and  apply  soil  analysis  instruments.  Its  chassis  can  transform  to 
stabilize  its  drill  in  contact  with  the  ground  and  can  also  adjust  posture  to  ascend 
and  descent  steep  slopes.  Scarab  has  undergone  field  testing  at  lunar  analogue  sites 
in  Washington  and  Hawaii  in  an  effort  to  quantify  and  validate  its  mobility  and 
navigation  capabilities.  We  report  on  results  of  experiments  in  slope  ascent  and 
descent  and  in  autonomous  kilometer-distance  navigation  in  darkness. 


Introduction 

To  discover  and  measure 
the  resources  of  the  moon, 
robotic  systems  will  have  to 
survive  extremes  from  blaz¬ 
ing  sunlight  to  frigid  dark¬ 
ness  as  well  as  dust,  vacuum, 
and  isolation.  Scarab  is  a 
prospecting  rover  developed 
to  perform  the  necessary  sci¬ 
ence  operations  to  locate 
volatiles  and  validate  in  situ 
resource  utilization  methods. 

[Sanders09]  (Fig.  1)  It  is  a 
terrestrial  concept  vehicle 
designed  to  deploy  a  deep 
coring  drill  and  to  transport 
soil  analysis  instruments.  The  vehicle  design  employs  a  passive  kinematic  sus¬ 
pension  with  active  posture  adjustability.  Its  chassis  can  lower  to  stabilize  a  cor¬ 
ing  drill  in  contact  with  the  ground  and  can  also  adjust  to  control  roll,  meaning 


Figure  1 .  Scarab  lunar  rover  prototype  on  uncon¬ 
solidated  sandy  soil  in  eastern  Washington 
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rotation  about  its  longitudinal  axis,  by  independently  adjusting  its  side-frames. 
This  allows  it  to  drive  cross-slope  and  turn  switchbacks  to  better  ascend  and  de¬ 
scend  unconsolidated  soil. 

Scarab  is  intended  to  operate  on  and  within  lunar  craters,  particularly  in  polar 
regions.  Because  the  interior  slopes  and  crater  floor  are  sometimes  in  shadow,  or 
in  some  cases  in  permanent  darkness,  active  sensing  methods  are  needed  for  ter¬ 
rain  modeling  and  autonomous  navigation.  Scarab  employs  laser  range  scanners 
with  autonomous  navigation  algorithms  to  build  models  of  the  surrounding  terrain 
to  detect  obstacles  and  then  determine  efficient  and  safe  paths. 

In  this  paper  we  review  results  from  field  experiments  at  Moses  Lake  Dunes, 
Washington  and  Mauna  Kea,  Hawaii  to  measure  and  verify  the  prototype  rover’s 
ability  to  meet  the  demands  of  a  lunar  polar  crater  prospecting  mission. 


Rover  Configuration 


Scarab  was  conceived  as  a  work  machine  with  a  serialized  mission:  drive, 
charge  batteries,  drill,  charge  again,  analyze  soil  samples,  charge  and  repeat.  The 
number  of  repetitions  might  be  25,  leading  to  25  kilometers  of  traverse,  25  cores, 
and  25  sites  surveyed.  For  some  craters,  100  repetitions  might  be  more  desirable 
to  characterize  the  environment  and  resources. 

There  are  many  factors  effecting  the  rover  configuration  but  the  drill  mecha¬ 
nism  and  its  operation  dominate.  The  requirement  to  transport  and  stabilize  a  deep 
coring  drill  literally  became  central  to  the  design  while  requirements  for  ascent 
and  descent  in  cratered  terrain  shaped  many  aspects  and  fine  details. 

Drilling  requires  a  stiff  platform  into  which  thrust  loads,  torques  and  vibrations 
are  transmitted  and  hole  alignment  is  maintained.  Placement  of  the  drill  in  line 
with  the  vehicle’s  center-of-mass  maximizes  the  mass  that  can  be  applied  in  down 
force.  (Fig.  2)  Drilling  operations  receive  three  benefits  of  this  feature;  first,  low¬ 
ering  the  chassis  allows  the  full  stroke  of  the  drill  to  be  used  in  the  soil  resulting  in 
mass  savings  overall.  Secondly,  the  rover  can  lean  and  therefore  re-stabilize  and 
place  the  rover  center-of-mass  over  the  drill  core.  Lastly,  under  low  gravity  condi¬ 
tions,  the  drill  torques  are  counteracted  by  the  increased  leverage  arm  created  by 
spreading  the  rover  wheelbase. 

The  rationale  for  the  vehicle  weight  and  size  is  based  on  the  1  m  long,  3  cm 
diameter  drill  that  is  likely  to  be  employed  in  a  lunar  mission.  Not  only  does  the 
rover  have  to  support  the  drill  but  also  it  must  provide  sufficient  weight  against 
which  the  drill  can  react  its  downward  thrust  and  torque  about  the  bit.  Drill  thrusts 
are  expected  to  reach  250  N  and  50  Nm  torque.  The  system  weight  on  lunar  sur¬ 
face  must  react  drilling  250  N  downforce  and  maintain  150  N  on  wheels  for  stabil¬ 
ity  against  uplift  and  spin,  therefore  total  weight  on  lunar  surface  must  be  greater 
than  400  N.  The  weight  in  lunar  gravity  (400  N  /  1.622  m/s2  =  250  kg)  leads  to  a 
minimum  250  kg  vehicle  mass.  [Bartlett08] 
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The  Scarab  rover’s  chas¬ 
sis  and  suspension  was  de¬ 
signed  around  the  drill.  This 
component  of  the  payload  is 
significant  in  mass  (50  kg) 
and  imposes  forces  on  the 
chassis  during  transport  and 
while  interacting  with  the 
ground. 

Scarab’s  chassis  allows  it 
to  passively  conform  to  the 
terrain.  The  suspension  has 
active  and  passive  elements 
for  improved  traction  on 
slope  terrain.  The  active 
element,  as  previously  dis¬ 
cussed  with  respect  to  drill¬ 
ing,  allows  the  rover  to 
level  the  body,  leading  to 
increased  stability  and  trac¬ 
tion  efficiency.  The  passive 
element,  sometimes  called 
an  averaging  (or  differenc¬ 
ing)  linkage  provides  a  me¬ 
chanical  release  allowing 
the  two  rover  suspension 
side-frames  to  pivot  inde¬ 
pendently. 

The  averaging  linkage 
ensures  the  body  is  pitch- 
averaged  between  the 
rocker  arms.  Scarab's  body 
has  three  contact  points.  On 
either  side,  the  body  is  con¬ 
nected  to  the  pivot  in  the 
rocker  arms.  On  top,  the 
body  hangs  from  the  differ¬ 
encing  linkage.  This  linkage 
runs  across  the  top  of  the 
body  and  also  connects  to 
the  rocker  arms.  Scarab 
actively  controls  its  roll 
using  the  rocker  arms  by 
changing  the  height  of  each 
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Figure  2.  Scarab  rover  configuration  showing 
placement  of  sensors,  avionics,  and  payload. 
There  are  drive  motors  in  each  of  four  wheels  and 
two  linkages  for  adjusting  sideframe  height.  An 
averaging  linkage  allows  all  four  wheels  to  main¬ 
tain  ground  contact  in  rough  terrain. 


Table  1 .  Scarab  Rover  Specifications 


Mass 

280  kg 

Weight 

2740  N  Earth  surface 

450  N  Lunar  surface 

Locomotion  speed 

3  -  6  cm/s 

Wheel  diameter 

65  cm 

Track  width 

1.4  m 

Wheelbase 

0.8-  1.4  m 

1 .2  m  nominal 

Aspect  ratio 
(track/wheelbase) 

1 : 1 .0  low  stance 

1:1.2  nominal  stance 

1 : 1 .7  high  stance 

CG  planar  location 

On  geometric  center 

CG  height 

0.48  m  low  stance 

0.64  m  nominal  stance 

0.74  m  high  stance 

Static  pitch-over 

56°  low  stance 

43°  nominal  stance 

30°  high  stance 

Static  roll-over 

61°  low  stance 

53°  nominal  stance 

49°  high  stance 

Maximum  straddle 

0.55  m 

Minimum  straddle 

0.00  m  (ground  contact) 
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side  independently,  thus  controlling  the  roll.  In  contrast  the  pitch  is  passive. 
Scarab's  wheels  conform  to  the  terrain,  which  rotates  the  rocker  arms  and  swivels 
the  differencing  linkage.  The  linkage  is  constructed  such  that  the  body  is  forced  to 
move  up  or  down  by  half  the  angle  between  the  two  rocker  arms.  As  the  center-of- 
mass  of  the  rover  is  located  midway  between  the  side  frames,  equal  loading  occurs 
on  all  four  wheels  even  on  drastically  uneven  terrain. 


Mobility  Experiments 


Testing  Scarab  in  the  field  has  been  critical  in  proving  the  concept  for  lunar 
mobility  and  quantifying  performance.  Experiments  have  been  conducted  in  nu¬ 
merous  conditions  with  several  findings  of  importance.  However  it  is  understood 
that  continuing  experimentation  is  needed  to  provide  the  data  for  a  fully  validated 
performance  model  and,  most  important,  to  enable  extrapolation  of  terrestrial  re¬ 
sults  to  the  lunar  environment. 

Moses  Lake  Sand  Dunes  in  Washington  was  chosen  as  a  test  site  for  its  varied 
terrain  (slopes,  pits,  etc.),  low  moisture  content,  varied  soil  types  (strengths,  size 
distribution)  and  wide  open  space.  These  qualities  provided  grounds  for  mobility 
traction  testing  and  long  distance  dark  navigation  traverses.  Steep  slope  ascent/ 
descent  in  loose  soil  and  tests  of  new  slope  climbing  techniques  and  algorithms 
were  the  focus  of  these  field  tests  in  June  2008. 


Another  lunar  analogue  site,  located  on  Mauna  Kea,  Hawaii,  is  at  high  altitude 
with  dry,  deep,  basaltic  volcanic  ash  allows  repeated  mobility  and  navigation  ex¬ 
periments.  The  soil  composition  and  mechanical  properties  at  this  site  were  ideal 
for  the  regolith  sampling  hardware  experiments.  The  objectives  of  these  tests  in 
November  2008  were  to  demonstrate  roving,  drilling,  sample  acquisition,  process¬ 
ing  and  analysis.  The  rover  was  able  to  autonomously  traverse  kilometers  of  rough 
terrain,  inspect  a  drill  site,  drill  to  1  m  depth,  process  the  core  samples  and  analyze 
the  composition  of  the  captured  soil  and  demonstrate  extraction  of  water  from  soil. 

Characterization  of  Scarab  as  a  system  for  difficult  terrain  mobility  was  first 
quantified  in  the  laboratory  in  statics  tests  and  in  sandboxes.  [Bartlett08]  The  in¬ 
dependently  actuated  rocker  arms  of  the  Scarab  rover  allow  for  actively  controlled 
center-of-mass  shifting.  The  JPL  Sample  Return  Rover  has  similar  capability 
[IagnemmaOO].  Benefits  of  this  feature  include  decreased  slip  during  cross-slope 
maneuvers.  Scarab  was  tested  normal  to  the  slope  and  leaning  to  maintain  vertical 
posture  with  cross-slope  of  10°,  15° and  20°.  A  surveying  total  station  tracked  a 


Table  2. 

Downhill  Slip 

Slope 

Normal 

Leaning  Change 

10° 

6% 

2% 

-4% 

15° 

22% 

8% 

-14% 

20° 

37% 

15% 

-22% 

prism  on  the  rover  to  millimeter  accuracy  and 
recorded  instantaneous  slip  measurements.  The 
outcome,  expressed  as  percentage  downhill  slip 
with  respect  to  cross-slope  distance,  appears  in 
Table  2. 

The  considerable  decrease  in  downhill  slip  (2.5x 
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at  20  °  incline)  arises  from  increased  traction  due  to  equalization  of  wheel  loading 
in  highly  cohesive  soils  and  edging  effects  of  the  wheel  profile  in  frictional  soils. 
The  significance  of  this  outcome  lies  in  the  ability  to  descend  and  navigate  steeper 
slopes  with  while  maintaining  adequate  control  authority. 

A  widely  used  metric  for  measuring  the  total  tractive  ability  of  a  vehicle  is 
drawbar  pull.  This  is  the  value  the  vehicle  can  pull  in  a  specified  material  while 
maintaining  forward  progress.  The  maximum  drawbar  pull  value  corresponds  to 
the  inflection  in  the  load/slip  curve  where  the  soil  fails  and  the  wheel  enters  the 
high  slip  regime.  This  value  is  informative  when  comparing  different  designs  and 
also  used  for  determining  the  slope  a  vehicle  can  ascend  for  the  specified  material. 
Drawbar  pull  experiments  were  conducted  in  Washington  and  Hawaii  to  evaluate 
of  the  effects  of  rover  mass  properties,  wheel  design,  and  soil  properties  on  trac¬ 
tive  performance.  (Fig.  3)  Both  the  drawbar  pull  value  and  power  values  derived 
from  this  test  are  used  as  metrics  to  determine  performance. 

The  key  observations  are  the  range  of  tractive  values  that  occur  with  changing 
soil  properties.  For  high  bearing  strength  materials,  the  level  of  looseness  and 
compaction  does  result  in  slightly  varied  traction  and  power  (shear  strength  and 
sinkage  respectively).  The  overall  mass  also  has  little  effect  on  the  normalized 
drawbar  pull  value  (percentage  of  vehicle  weight)  although  with  extremely  low 
bearing  strength  materials,  this  does  not  hold  true  as  a  result  of  excessive  sinkage. 
The  shear  strength  comes  from  cohesiveness  and  internal  friction.  As  a  result,  the 
drawbar  pull  values  can  be  representative  of  slopes  achievable  for  only  highly 
cohesive  material  as  the  normal  loading  of  the  surface  is  constant  throughout  test¬ 
ing.  The  most  significant  effect  on  traction  has  resulted  in  wheel  design.  Experi¬ 
ments  involving  different  traction  surfaces,  wheel  diameter  and  ground  pressures 
have  shown  a  large  range  of  drawbar  pull  values.  Differences  of  50%  have  been 
achievable  through  traction  surface/grouser  modifications.  Lowering  ground  pres¬ 
sure  and  reducing  sinkage  has  moderate  effects  on  traction  but  results  in  large  dif¬ 
ferences  in  driving  power  (up  to  50%  during  experiments).  Drawbar  pull  tests  per¬ 
formed  as  lab  and  field 
experiments  have  high¬ 
lighted  wheel  design  as  a 
leading  element  in  tractive 
and  power  design  require¬ 
ments.  (Table  3.)  This  is 
important  because  wheel 
design  is  generally  inde¬ 
pendent  of  the  suspension 
design  and  can  be  opti¬ 
mized  for  traction  and 
power  efficiencies. 

Active  control  methods 
can  also  lead  to  increased 
tractive  performance. 


Automatic 

Position 

Tracking 


Variable  resistance 
provided  by  weights 
on  sled 


Pnsm  for" 

I  ' _ N  tracking  position  - 

to  detect  slip 


v 

Strain  gauge 
for  measunng 
pulled  load 


Figure  3.  Drawbar  pull  experimental  setup.  Weight  is 
added  to  the  sled  with  the  rover  in  motion  while  slip  is 
continuously  monitored. 
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Techniques  such  as  “inch- 
worming”  can  increase  the 
mobility  of  a  rover.  (Fig.  4)  To 
begin  the  cycle  of  inch- 
worming,  the  body  lowers 
while  expanding  the  wheel¬ 
base  and  rolling  the  front 
wheels  forward  while  the  rear 
wheels  remains  static.  In  the  ' 
second  half  of  the  cycle,  the 
body  raises  and  the  wheel 
base  shortens  while  the  rear 
wheels  rolls  forward  and  the 
front  wheels  remains  static. 
Non-rotating  wheels  provide  a 
fixed  point  of  reaction  with  no 
slipping.  To  achieve  these 
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Figure  4.  Conventional  rolling  versus  inch- 
worming  where  one  wheel  pair  is  synchronized  to 
the  side-arm  expansion/contraction  and  the  other 
reacts  forces  into  the  ground. 


benefits,  Scarab’s  inch-worming  algorithm  relies  on  eliminating  the  compaction 
resistances  on  two  of  the  four  wheels,  by  remaining  static  with  respect  to  ground, 
for  a  resulting  net  tractive  increase.  Experimentally  we  have  found  that  the  inch- 
worming  technique  is  best  suited  when  wheels  become  entrenched  under  high  slip. 
It  allows  the  rover  to  move  forward  (or  back  up)  out  of  this  situation. 

Actively-positioned  center-of-mass  can  also  increase  steepness  of  slopes  tra¬ 
versable:  distributing  the  load  amongst  the  rover’s  wheels  leads  to  more  efficient 
traction.  Center-of-mass  shifting  (body  leaning)  was  tested  and  heading  slip,  the 
slip  in  the  commanded  direction  with  respect  to  the  commanded  velocity,  showed 
increase  the  steepness  of  slopes  ascendable.  The  experiments  were  conducted 
with  a  25°  angle  attack  from  the  horizontal.  This  value  was  determined  experi¬ 
mentally  to  have  adequate  uphill  progress  and  low  slip.  It  was  shown  that  with  the 
transformable  suspension  of  the  Scarab  rover,  slopes  of  20°  loose,  dry,  volcanic 
ash  can  be  ascended  with  low  risk. 


Table  3. 

Soil  Depth 

Lunar  Wheel 

Rubber  Tire 

Difference 

Locomotion  Power 

7.5  cm 

100W,  100W  w/grouser 

158W 

0.58 

5.0  cm 

95W 

160W 

0.68 

2.5  cm 

95W 

103W 

0.08 

1  cm 

98W 

117W 

0.19 

Maximum  Drawbar 

Pull 

7.5  cm 

23%,  32%  w/grouser 

28% 

0.18,0.28 

5.0  cm 

24% 

32% 

0.25 

2.5  cm 

32% 

39% 

0.18 

1  cm 

33% 

50% 

0.34 
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Navigation  Experiments 
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Figure  5.  Scarab  navigating  in  darkness.  Laser 
scanner  perceives  terrain  ahead  and  an  underbody 
optical  velocity  sensor  detects  slip. 


Scarab  navigates 
autonomously  on  kilometer 
scales.  A  route  planning 
algorithm  generates  inter¬ 
mediate  goals  (typically 
with  100  m  spacing)  and 
the  operator  may  specify 
multiple  goals,  Scarab  will 
reach  each  goal  in  order. 

Scarab  uses  an  onboard 
inertial  measurement  de¬ 
vice  (Honeywell  HG1700) 
and  optical  ground  speed 
sensor  to  enable  it  to  esti¬ 
mate  position  and  velocity 
with  1  -  3  %  error  on  dis¬ 
tance  traveled.  (Fig.  5) 

Laser  ranging  provides  measurements  to  build  models  of  the  surrounding  terrain 
to  detect  obstacles  and  then  determine  efficient  and  safe  paths.  (Fig.  6) 

Scarab  periodically  scans  the  terrain  using  a  laser  rangefinder  developed  by 
Neptec  Design  Group.  [Neptec08]  Previous  autonomous  rovers  have  used  stereo 
cameras  for  high  density  terrain  observations  with  low  power.  [Wettergreen08]  In 
the  scenario  that  Scarab  addresses,  polar  lunar  craters,  there  will  be  significant 
areas  of  slope  and  crater  floor  in  shadow  and  in  some  cases,  perpetual  darkness  so 
active  sensing  devices  are  required.  Scarab  acquires  a  scan  after  appreciable  driv¬ 
ing  (more  than  3  m)  or 


turning  (greater  than  10°) 
or  after  time  has  elapsed 
(more  than  100  sec).  The 
sensor  produces  a  dense 
array  of  ranges  and  takes 
several  seconds,  so  motion 
must  stop  to  avoid  warping 
the  data.  The  navigation 
algorithms  assume  a  static 
world,  meaning  the  terrain 
does  not  change  between 
scans.  Each  3D  cloud  of 
range  points  is  incorpo¬ 
rated  into  the  terrain 
model.  The  range  points 
are  filtered  (to  remove 


Figure  6.  Mesh  terrain  model  used  to  represent  ob¬ 
stacles  and  to  evaluate  and  refine  the  path.  Modeling 
and  evaluation  iterate  to  navigate  to  the  goal. 
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noise  and  artifacts)  and 
transformed  into  the  vehi¬ 
cle’s  coordinate  frame. 
Coarse  data  reduction  on 
the  point  cloud  is  applied 
and  the  point  cloud  is 
transformed  into  a  mesh. 
The  mesh  is  then  further 
reduced  to  eliminate  re¬ 
dundant  data.  Finally  the 
mesh  is  aligned  with  prior 
data  to  generate  the  terrain 
model  that  is  used  to  iden¬ 
tify  obstacles  and  select 
the  best  path  to  the  goal. 
Many  candidate  vehicle 
motions  are  evaluated  in 
the  near-  and  far-field.  The 
near-field  analysis  in¬ 
volves  simulating  vehicle 
motion  on  the  mesh  to 
identify  collision  and  slope 
hazards  and  assess  their 
severity.  The  far-field 
analysis  applies  heuristic 
search  to  estimate  the  pro¬ 
gress  each  potential  move 
will  make  toward  the  goal. 
A  cost  function  combines 
safety  in  the  near-field 
with  progress  in  the  far- 
field. 

Our  experimental  ap¬ 
proach  has  been  to  conduct 
1  km  traverses  in  a  variety 
of  terrains  with  progressive 
improvements  to  algo¬ 
rithms.  At  both  the  Moses 
Lake  and  Mauna  Kea  sites, 
Scarab  autonomously 
completed  the  following 
objectives:  travel  over  3 
km,  perform  2  night  trav¬ 
erses  and  simulate  crater 


Figure  7.  Autonomous  descent  into  a  large  pit.  Ren¬ 
dered  views  are  terrain  model  with  path  of  rover  at 
rim,  intermediate,  and  viewing  the  floor.  Images  show 
rover  during  descent.  Scarab  discovered  a  moderate 
slope  and  reached  the  floor  autonomously. 
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descent.  Traverses  are  kilometer  scale  and  performed  after  sunset,  they  account  for 
most  of  the  total  distance  traveled  at  each  site.  Crater  descent  was  conducted  with 
a  long  (100  m)  traverse  that  included  descending  a  steep  (10°)  slope. 

Scarab  completed  a  total  of  3.6  km  in  27  traverses  in  Washington.  The  first 
dark  traverse  was  1 .2  km  with  4  interventions  due  to  sensor  faults  and  one  due  to  a 
controller  error.  These  faults  are  recoverable;  they  do  not  jeopardize  the  rover  and 
are  easily  resolved  by  resetting  a  device.  A  second  dark  traverse  used  an  alterna¬ 
tive  navigation  algorithm  [Pedersen08]  and  completed  1 . 1  km  with  two  interven¬ 
tions  due  to  localization  errors. 

Scarab  traveled  3.0  km  in  20  traverses  at  Mauna  Kea,  most  of  this  was  accom¬ 
plished  during  the  two  overnight  traverses.  The  first  dark  traverse  was  split  into 
two  parts;  after  199  m  the  traverse  was  paused  for  logistic  reasons  and  later  Scarab 
resumed  for  an  additional  779  m  before  stopping  due  to  a  software  error.  The  sec¬ 
ond  overnight  traverse  was  also  split  in  two;  the  first  part  was  312  m  and  stopped 
on  a  software  error,  the  second  was  989  m  and  ended  with  a  CANBus  fault.  All  of 
these  errors  are  recoverable  remotely. 

At  each  site,  Scarab  autonomously  completed  a  simulated  crater  descent  using 
available  analogue  terrain.  At  Moses  Lake,  Scarab  drove  into  a  9  m  deep  pit  with 
10°-20°sloped  sides.  (Fig.  7)  This  was  safely  accomplished  including  two  undi¬ 
rected  switchback  maneuvers.  At  Mauna  Kea,  Scarab  repeatedly  drove  down  a 
winding  drainage  channel.  The  route  was  over  100  m  long  and  descended  over  25 
m  with  an  average  grade  of  about  10°. 

Traverse  termination  conditions 
for  both  field  tests  are  shown  in 
Chart  1.  No  interventions  were  u 
required  to  stop  the  vehicle  from  j 
driving  into  a  hazard  (zero  emer-  -j 
gency  stops).  At  Moses  Lake,  most  « 
traverses  (15  of  25)  ended  with  a  [ 
recoverable  fault.  On  Mauna  Kea  \ 
the  navigation  method  had  im-  2 
proved  and  most  traverses  ended 
by  reaching  the  goal  (8  of  20)  or 
stopping  the  traverse  for  other  rea¬ 
sons  (6  of  20).  Recoverable  faults 
are  those  that  could  be  remotely 
corrected  and  thus  would  not  be 
mission  ending  in  a  lunar  scenario. 

These  results  are  far  from  perfect 
but  indications  are  that  reliability  is 
improving  and  will  reach  the  level  of 
previous  planetary  rover  prototypes. 


Termination  Cause 


Chart  1.  Termination  conditions  in  autono¬ 
mous  navigation  experiments. 
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Conclusion 

The  Scarab  rover  has  been  uniquely  configured  for  the  transport  and  stabiliza¬ 
tion  of  a  coring  drill  and  associated  soil  analysis  instruments.  The  benefits  of 
central-mounting  and  active  body  height  and  roll  control  are  apparent  in  deploy¬ 
ment  of  the  drill  and  improved  ability  to  ascend  and  descend  cross-slope. 

Field  experimentation  has  quantified  drawbar  pull  and  slope  climbing  ability  as 
well  as  power  required  for  these  activities  under  a  variety  of  soil  and  terrain  condi¬ 
tions.  Field  demonstrations  have  also  proven  the  capability  of  the  laser-based 
navigation  system  for  kilometer-scale  autonomous  traverse,  including  autonomous 
descent  into  a  crater.  In  total  the  mobility  and  navigation  requirements  for  a  lunar 
surface  prospecting  mission  have  been  demonstrated  in  analogue  terrain. 
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Radar  Scan  Matching  SLAM  using  the 
Fourier-Mellin  Transform 
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Abstract  This  paper  is  concerned  with  the  Simultaneous  Localization  And  Map¬ 
ping  (SLAM)  problem  using  data  obtained  from  a  microwave  radar  sensor.  The 
radar  scanner  is  based  on  Frequency  Modulated  Continuous  Wave  (FMCW)  tech¬ 
nology.  In  order  to  meet  the  needs  of  radar  image  analysis  complexity,  a  trajectory  - 
oriented  EKF-SLAM  technique  using  data  from  a  360°  field  of  view  radar  sensor 
has  been  developed.  This  process  makes  no  landmark  assumptions  and  avoids  the 
data  association  problem.  The  method  of  egomotion  estimation  makes  use  of  the 
Fourier-Mellin  Transform  for  registering  radar  images  in  a  sequence,  from  which 
the  rotation  and  translation  of  the  sensor  motion  can  be  estimated.  In  the  context 
of  the  scan-matching  SLAM,  the  use  of  the  Fourier-Mellin  Transform  is  original 
and  provides  an  accurate  and  efficient  way  of  computing  the  rigid  transformation 
between  consecutive  scans.  Experimental  results  on  real-world  data  are  presented. 


1  Introduction 

Environment  mapping  models  have  been  studied  intensively  over  the  past  two 
decades.  In  the  literature,  this  problem  is  often  referred  to  as  simultaneous  local¬ 
ization  and  mapping  (SLAM).  For  a  broad  and  quick  review  of  the  different  ap¬ 
proaches  developed  to  address  this  problem,  one  can  consult  [2],  [8],  [9]  and  [25]. 
Localization  and  mapping  in  large  outdoor  environments  are  applications  related  to 
the  availability  of  efficient  and  robust  perception  sensors,  particularly  with  regard  to 
the  problem  of  maximum  range  and  the  resistance  to  the  environmental  conditions. 
Most  approaches  to  map  learning  generate  2D  models  from  range  sensor  data.  Even 
though  lasers  and  cameras  are  well  suited  sensors  for  indoor  environments,  their 
strong  sensitivity  to  atmospheric  conditions  has  created  an  interest  for  doing  SLAM 
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with  radars  and  sonars  [21].  Microwave  radar  provides  an  alternative  solution  for 
environmental  imaging  and  overcomes  the  shortcomings  of  laser,  video  and  sonar 
sensors.  In  this  paper,  a  trajectory-oriented  SLAM  technique  is  presented  using  data 
from  a  360°  field  of  view  radar  sensor.  This  radar  is  based  on  Frequency  Modulated 
Continuous  Wave  (FMCW)  technology  [16]. 

In  Section  2,  a  review  of  articles  related  to  our  research  interests  is  carried  out 
in  order  to  position  our  work  in  relation  to  existing  methods.  Section  3  presents 
the  microwave  radar  scanner  developed  by  a  Cemagref  research  team  (in  the  field 
of  agricultural  and  environmental  engineering  research)  [22].  The  way  to  obtain 
a  radar  image  (i.e.  the  power  spectra  with  polar  coordinates)  is  briefly  presented. 
Section  4  gives  the  SLAM  formulation  used  in  this  paper.  There,  the  Fourier-Mellin 
Transform  is  applied  to  register  images  in  a  sequence  and  to  estimate  the  rotation 
and  translation  of  the  radar  system  (see  Section  5).  This  process  makes  no  landmark 
assumptions,  and  avoids  the  data  association  problem  by  storing  a  detailed  map 
instead  of  sparse  landmarks.  Finally  Section  6  shows  experimental  results  of  this 
work,  which  were  implemented  (and  tested  on  recorded  real  data)  in  Matlab  and 
C/C++.  Section  7  concludes  and  introduces  future  work. 


2  Related  Work 

2.1  In  the  Field  of  Radar  Mapping 

In  order  to  perform  outdoor  SLAM,  laser  sensors  have  been  widely  used  [19]  [11] 
[4].  A  recent  application  with  Velodyne  HDL-64  3D  LIDAR  is  presented  in  [13]. 
To  provide  localization  and  map  building,  the  input  range  data  is  processed  using 
geometric  feature  extraction  and  scan  correlation  techniques.  Less  research  exists 
using  sensors  such  as  underwater  sonar  [21]  and  Frequency  Modulated  Continuous 
Wave  (FMCW)  radar.  Interestingly,  this  last  kind  of  sensor  was  already  used  by 
Clark  in  [6]  at  the  end  of  the  last  century.  In  an  environment  containing  a  small 
number  of  well  separated,  highly  reflective  beacons,  experiments  were  led  with  this 
sensor  to  provide  a  solution  to  the  SLAM  problem  [8]  using  an  extended  Kalman 
filter  framework  and  a  landmark  based  approach.  Finally,  in  [17],  methods  were 
presented  for  building  a  map  with  sensors  that  return  both  range  and  received  signal 
power  information.  An  outdoor  occupancy  grid  map  related  to  a  30  m  vehicle’s 
trajectory  is  analyzed.  So  far,  there  seems  to  have  been  no  trajectory-oriented  SLAM 
work  based  on  radar  information  over  important  distances.  However,  vision-based, 
large-area  SLAM  has  already  been  carried  out  successfully  for  underwater  missions, 
using  information  filters  over  long  distances  [10]  [15]. 
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2.2  In  the  Field  of  Scan  Matching  SLAM 


Since  Lu  and  Milios  presented  their  article  [14]  in  search  of  a  globally  consistent 
solution  to  the  2D-SLAM  problem  with  three  degrees  of  freedom  poses,  many  tech¬ 
niques  have  been  proposed  in  the  literature  concerning  robotics  as  well  as  computer 
vision.  A  common  method  of  pose  estimation  for  mobile  robots  is  scan  matching. 
By  solving  the  rigid  transformation  between  consecutive  scans  from  a  range  sen¬ 
sor,  the  robot’s  motion  in  the  time  period  between  the  scans  can  be  inferred.  The 
sensor  used  is  most  often  a  scanning  laser  range  finder.  One  of  the  most  popular 
approaches  for  scan  matching  is  the  Iterative  Closest  Point  (ICP)  algorithm  [3].  In 
ICP,  the  transformation  between  scans  is  found  iteratively  by  assuming  that  every 
point  in  the  first  scan  corresponds  to  its  closest  point  in  the  second  scan,  and  by  cal¬ 
culating  a  closed  form  solution  using  these  correspondences.  However,  sparse  and 
noisy  data,  such  as  that  from  an  imaging  radar,  can  cause  an  ICP  failure.  A  single 
noisy  reading  can  significantly  affect  the  computed  transformation,  causing  the  es¬ 
timated  robot  pose  to  drift  over  time.  Other  recent  trends  in  SLAM  research  are  to 
apply  probabilistic  methods  to  3D  mapping.  Cole  et  al.  [7]  use  an  extended  Kalman 
filter  on  the  mapping  problem.  Olson  et  al.  [18]  have  presented  a  novel  approach 
to  solve  the  graph-based  SLAM  problem  by  applying  stochastic  gradient  descent  to 
minimize  the  error  introduced  by  constraints. 

In  its  current  version,  our  algorithm  is  close  to  the  method  suggested  by  Cole  et 
al.  [7].  However,  the  Fourier-Mellin  Transform  for  registering  images  in  a  sequence 
is  used  to  estimate  the  rotation  and  translation  of  the  radar  sensor  motion  (see  Sec¬ 
tion  5).  In  the  context  of  scan-matching  SLAM,  the  use  of  the  Fourier-Mellin  Trans¬ 
form  is  original  and  provides  an  accurate  and  efficient  way  of  computing  the  rigid 
transformation  between  consecutive  scans.  It  is  a  global  method  that  takes  into  ac¬ 
count  the  contributions  of  both  range  and  power  information  of  the  radar  image. 


3  A  Microwave  Radar  Scanner 

The  exploited  radar  uses  the  frequency  modulation  continuous  wave  (FMCW)  tech¬ 
nique  which  has  been  known  for  several  decades  [23]  [16].  Frequency  modulation 
presents  two  advantages  for  mobile  robotics  application,  where  distances  are  hun¬ 
dreds  of  meters  [22] .  First,  it  permits  a  low  transmission  power,  which  is  safer  for 
the  user  (the  mean  power  determines  the  range).  Second,  a  transposition  of  tempo¬ 
ral  variables  into  the  frequency  domain  allows  to  obtain  the  measure  more  easily  (a 
very  short  delay  time  A  ns  switched  to  a  broad  variation  of  frequency  Af). 

The  FMCW  radar  is  called  K2Pi  (271  for  panoramic  -  in  K  band).  A  general 
view  of  the  radar  is  presented  in  Figure  1  and  its  main  characteristics  are  listed  in 
Table  1 .  The  radar  is  equipped  with  a  rotating  antenna  in  order  to  achieve  a  complete 
360°  per  second  monitoring  around  the  vehicle,  with  an  angular  resolution  of  3°,  in 
the  3-100  m  range.  The  image  construction  is  based  on  the  classical  Plan  Position 
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Fig.  1  Left  side.  The  K2Pi  FMCW  radar.  All  the  radar  components  are  implemented  in  the  same 
housing:  microwave  components,  electronic  devices  for  emission  and  reception,  and  the  data  ac¬ 
quisition  and  signal  processing  unit.  The  radar  is  mono-static:  a  single  antenna,  protected  by  a 
radome,  is  used  for  both  transmitting  and  receiving.  Right  side.  Two  consecutive  radar  images  ((a) 
&  (b))  that  are  fairly  similar. 


Indicator  (PPI)  representation,  i.e.  the  power  spectra  with  polar  coordinates.  An 
example  of  radar  images  is  presented  in  Figure  1.  Variations  of  shading  indicate 
variations  of  amplitude  in  the  power  spectra.  These  images  are  "radar  referenced": 
the  heading  indications  are  related  to  the  internal  encoder  of  the  radar  and  not  to  the 
earth’s  magnetic  field. 


Table  1  Characteristics  of  the  K2Pi  FMCW  radar. 


Carrier  frequency  F 0 

24  GHz 

Transmitter  power  Pt 

20  dBm 

Antenna  gain  G 

20  dB 

Bandwidth 

250  MHz 

Angular  resolution 

3° 

Angular  precision 

0.1° 

Range  Min/Max 

3  m/100  m 

Distance  resolution 

0.6  m 

Distance  precision  (canonical  target  at  100  m) 

0.05  m 

Size  (length-width-height) 

27-24-30  cm 

Weight 

10  kg 

4  Problem  Formulation 

4.1  SLAM  Process 


The  used  formulation  of  the  SLAM  problem  is  to  estimate  the  vehicle  trajectory 

1  T 


defined  by  the  estimated  state  = 


T  T 

Y  Y  Y1 


•  xvi  =  feh',  0*]  is  the  state 


vector  describing  the  location  and  orientation  of  the  vehicle  at  time  i.  There  is  no 
explicit  map;  rather  each  pose  estimate  has  an  associated  scan  of  raw  sensed  data 
that  can  be  next  aligned  to  form  a  global  map. 
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4.2  Radar  Scan  Matching  SLAM 


The  developed  approach  for  a  SLAM  process  is  based  on  the  following  observation: 
two  consecutive  radar  images  are  very  similar  to  the  "eye"  point  of  view.  For  that 
reason  a  matching  approach  based  on  cross  correlation  function  was  selected  [1]. 
Scan  matching  is  the  process  of  translating  and  rotating  a  radar  scan  such  that  a 
maximal  overlap  with  another  scan  emerges.  Assuming  this  alignment  is  approxi¬ 
mately  Gaussian,  a  new  vehicle  pose  is  added  to  the  SLAM  map  by  only  adding  the 
pose  to  the  SLAM  state  vector.  So,  as  described  previously,  observations  are  asso¬ 
ciated  to  each  pose.  They  are  compared  and  registered  to  offer  potential  constraints 
on  the  global  map  of  vehicle  poses.  This  is  not  only  useful  for  odometry  based  state 
augmentation,  but  it  is  also  an  essential  point  for  loop  closing. 

The  estimator  used  here  is  the  EKF,  but  it  is  not  a  limitation:  algorithms  like  those 
presented  in  Section  2  could  be  tested  too.  Given  a  noisy  control  input  u(k  +  1)  at 
time  k+  1,  upon  calculation  of  the  new  vehicle  pose,  xVn+l  (k  +  1| k),  and  a  corres¬ 
ponding  covariance  matrix,  P Vn+l{k+  l\k),  the  global  state  vector,  x,  and  corres¬ 
ponding  covariance  matrix,  P,  can  be  augmented  as  follows: 


P(Jfc+l|Jfc) 


P(k\k) 

d(xVn  ®u(fc+l)) 


x(k  +  1|  k)  = 
P(k|k) 


x(k\k) 

xVn  0u(k+  1) 

d(xVn®u(fc+l))J 

dxv„ 


p(*l*)  PVn+l(k+l\k) 


(1) 

(2) 


The  operator  0  is  the  well-known  displacement  composition  operator.  Pv  t  (k  0 
1  \k)  is  the  covariance  of  the  newly  added  vehicle  state.  Let  us  assume  that  two  scans, 
Si,  Sj,  have  been  registered.  So,  an  observation  Ty  of  the  rigid  transformation  be¬ 
tween  poses  in  the  state  vector  exists.  Therefore  a  predicted  transformation  between 
the  two  poses  can  be  found  from  the  observation  model  as  follows: 

T ij(k  +  1  \k)  =  h  (x(k  + 1  |k))  =  ©  (0xy (k  +  1  \k)  0 xv;-(k  +  1  \k))  (3) 

where  the  operator  0  is  the  inverse  transformation  operator.  This  is  then  used  as  the 
initial  estimate  for  our  registration  algorithm  as  follows: 

Ty(£+1)  =^(Tu(k+l|k),S/,S;-)  (4) 

where  represents  a  registration  algorithm.  The  state  update  equations  are  then  the 
classical  EKF  update  equations.  The  search  for  a  transformation  Ty  is  achieved  by 
maximizing  a  cross  correlation  function  [1]. 


6 


P.  Checchin,  F.  Gerossier,  C.  Blanc,  R.  Chapuis  and  L.  Trassoudaine 


5  Fourier-Mellin  Transform  for  Automatic  Image  Registration 
5.1  Principle 


The  problem  of  registering  two  scans  in  order  to  determine  the  relative  positions 
from  which  the  scans  were  obtained,  has  to  be  solved.  The  choice  of  an  algorithm  is 
strongly  influenced  by  the  need  for  real-time  operation.  A  FFT-based  algorithm  was 
chosen  to  perform  scan  matching. 

Fourier-based  schemes  are  able  to  estimate  large  rotations,  scalings,  and  transla¬ 
tions.  Let  us  note  that  the  scale  factor  is  irrelevant  in  our  case.  Most  of  the  DFT-based 
approaches  use  the  shift  property  [20]  [12]  [24]  of  the  Fourier  transform.  To  match 
two  scans  which  are  translated  and  rotated  with  respect  to  each  other,  the  phase  cor¬ 
relation  method  is  used,  stating  that  a  shift  in  the  coordinate  frames  of  two  functions 
is  transformed  in  the  Fourier  domain  as  a  linear  phase  difference.  To  deal  with  the 
rotation  as  a  translational  displacement,  the  images  are  previously  transformed  into 
an  uniform  polar  Fourier  representation. 

It  is  known  that  if  two  images  I\  and  h  differ  only  by  a  shift,  (Ax,  Ay),  (i.e., 
h(x,y)  =h(x  —  Ax,y  —  Ay)),  then  their  Fourier  transforms  are  related  by: 

h(wx,Wy).e-i{-WxAx+wyAy)  =  I2(wx,Wy).  (5) 

Hence  the  normalized  cross  power  spectrum  is  given  by 

=  ¥w‘'w’)  =  = (6) 
h(WX,Wy)  \h(WX,Wy)ll(WX,Wy)*\ 

where  *  indicates  the  complex  conjugate.  Taking  the  inverse  Fourier  transform 
Corr(x,y)  =  F~l  (Corr(wx,wy))  =  5(x  —  Ax,y  —  Ay),  which  means  that  Corr(x,y) 
is  nonzero  only  at  (Ax,  Ay)  =  argm2LX(x^{Corr(x,y)}.  If  the  two  images  differ  by 
rotational  movement  (0o)  with  translation  (Ax,  Ay),  then 

h(x,y)  =  AO  cos  0O  +  y  sin  0q  —  Ax,  —  vsin0o  -f-ycos0o  —  Ay).  (7) 

Converting  from  rectangular  coordinates  to  polar  coordinates  makes  it  possible  to 
represent  rotation  as  shift:  The  Fourier  Transform  in  polar  coordinates  is  h(p,  0)  = 
e-i{wxAx+wyAy)f^p  q  _  Let  Mi  and  M2  denote  the  magnitudes  of  I\  and  h 
(Mi  =  |/i|,  M2  =  |/2|).  So,  Mi  and  M2  are  related  by  M\(p,Q)  =  M2(p, 0  —  0o). 
The  shift  between  the  two  images  can  now  be  resolved  using  Eq.  6. 


5.2  Scan  Registration 


In  order  to  perform  a  scan  registration  algorithm,  the  Fourier-Mellin  Transform 
(FMT)  has  been  chosen  [5]  [20].  The  FMT  is  a  global  method  that  takes  the  con¬ 
tributions  from  all  points  in  the  images  into  account  in  order  to  provide  a  way  to 
recover  all  rigid  transformation  parameters,  i.e.  rotation,  translation.  It  is  an  effi¬ 
cient  and  accurate  method  to  process  a  couple  of  images  that  are  fairly  similar  (see 
Fig.  1).  The  steps  of  the  scan  registration  algorithm  are  described  in  Alg.  1. 
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Algorithm  1  Steps  of  the  Fourier-Mellin  Transform  algorithm  applied  to  FMCW 
radar  images 

1 .  Get  radar  images  4  and  4- 1  • 

2.  Apply  thresholding  filter  to  eliminate  the  speckle  noise  in  both  images. 

3.  Apply  FFT  to  images  4  — »  4  and  4-1  — ►  4-1  • 

4.  Compute  the  magnitudes  M^=  |4| ,  A4_i  —  |4-i  | 

5.  Transform  the  resulting  values  from  rectangular  to  polar  coordinates.  Af()  — ►  MPQ. 

6.  Apply  the  FFT  to  polar  images,  a  bilinear  interpolation  is  used.  MPQ  — »  MPQ. 

7.  Compute  Corr(wp ,  )  between  MP& (wp ,  )  and  MP i  (wp ,  )  using  Eq.  6. 

8.  Compute  the  inverse  FFT  Corr(p ,  0)  of  Corr(wp  ,wq). 

9.  Find  the  location  of  the  maximum  of  CorrQ  and  obtain  the  rotation  value. 

10.  Construct  a  new  image  Ir  by  applying  reverse  rotation  to  4-1  • 

1 1 .  Apply  FFT  to  image  7r^_  i . 

12.  Compute  the  correlation  Corr(wx,wy)  using  Eq.  6. 

13.  Take  inverse  FFT  Corr(x,y)  of  Corr(wX)wy). 

14.  Obtain  the  values  (Ax,  Ay)  of  the  shift. 


6  Experimental  Results 

This  section  provides  experimental  results  of  the  Scan  SLAM  application  using  the 
radar  sensor  previously  described.  The  radar  and  the  proprioceptive  sensors  were 
mounted  on  a  utility  car  moving  at  a  speed  ranging  from  0  to  25  km/h.  Here,  two  ex¬ 
perimental  runs  are  presented.  They  were  performed  in  an  outdoor  field,  Blaise  Pas¬ 
cal  University  campus,  with  a  complex  environment  (buildings,  cars,  trees,  roads, 
road  signs,  etc.).  The  radar  was  on  top  of  the  vehicle,  3  meters  above  the  ground. 
The  estimated  trajectories  obtained  with  the  Scan  SLAM  process  are  presented  in 
Figures  2  and  5.  The  successive  positions  of  the  radar  are  separated  by  an  interval 
of  one  second.  The  photograph  (see  Fig.  2)  is  an  aerial  image  of  the  experimental 
zone.  The  trajectory  of  the  vehicle  simultaneously  measured  with  a  centimetrically- 
precise  GPS  is  overlayed.  For  these  experiments,  all  data  acquisitions  have  been 
realized  in  real  time  but  SLAM  processing  has  been  realized  off-line.  One  step  of 
the  process  (scan  registration,  prediction  and  update)  is  achieved  in  less  than  one 
second  with  Matlab  on  a  dual-core  2  GHz  laptop.  A  quantitative  evaluation  of  the 
localization  performances  of  the  implemented  process  has  been  achieved.  The  po¬ 
sition  errors  are  calculated  using  the  estimates  and  GPS  data,  assumed  to  be  the 
ground  truth  (see  Fig.  3  and  Fig.  6).  The  first  experiment  was  made  on  a  distance 
of  1,135  m  without  loop  closing.  Figure  2  shows  the  trajectory  of  the  vehicle.  In 
Figure  3,  error  and  standard  deviation  (lower  and  upper  bounds)  are  presented.  The 
global  map  that  is  obtained  is  shown  in  Figure  4.  The  second  experiment  was  made 
on  a  distance  of  700  m  with  loop  closing  (a  circular  trajectory  around  the  campus 
sports-ground).  In  Figure  5,  the  corrected  trajectory  after  loop  closing  is  presented. 
In  Figure  6,  error  and  standard  deviation  (lower  and  upper  bound)  are  presented. 
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Fig.  2  Overlay  of  the  estimated  trajectory  and  the  aerial  image  of  Blaise  Pascal  University  campus. 
The  total  traveled  distance  is  around  1,135  m.  The  thin  red  line  shows  the  trajectory  of  the  vehicle 
measured  with  a  centimetrically-precise  GPS.  The  vehicle  estimates  are  in  thick  white  dashes. 


Fig.  3  Error  and  standard  deviation  (lower  and  upper  bounds)  related  to  the  trajectory  depicted  in 
Fig.  2.  Near  sample  300,  there  is  a  GPS  loss. 


Fig.  4  Global  map  related  to  the  trajectory  depicted  in  Fig.  2. 


7  Conclusion  and  Future  Work 

This  paper  presented  results  of  SLAM)  using  a  microwave  radar  sensor.  Due  to 
the  complexity  of  radar  target  detection,  identification,  tracking  and  association, 
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Fig.  5  The  total  traveled  distance  is  around  700  m.  The  thin  red  line  shows  the  trajectory  of  the 
vehicle  measured  with  a  centimetrically-precise  GPS.  The  vehicle  estimates  after  the  loop  closing 
are  in  thick  white  dashes. 


Fig.  6  Standard  deviation  along  the  north  (x)  and  west  (y)  axes  between  pose  estimation  and  GPS 
and  influence  of  the  loop  closing. 


a  trajectory-oriented  SLAM  process  based  on  the  Fourier-Mellin  Transform  was 
developed;  in  this  way,  target  assumptions  about  their  position  and  nature  were 
avoided. 

Currently,  this  work  considers  only  a  static  environment,  assuming  that  there  are 
no  mobile  elements  around  the  radar.  However,  in  order  to  develop  a  perception 
solution  for  high  velocity  robotics  applications,  future  work  will  be  devoted  to  the 
enhancement  of  the  global  map  using  methods  such  as  the  one  described  in  [18]. 
Once  the  sensor  delivers  the  measurement  of  Doppler  frequency  to  take  the  relative 
velocity  of  mobile  targets  into  account,  integration  of  SLAM  with  Mobile  Object 
Tracking  (SLAMMOT)  will  be  considered  [25]. 
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An  Automated  Asset  Locating  System  (AALS) 
with  Applications  to  Inventory  Management 

Thomas  H.  Miller,  David  A.  Stolfo,  and  John  R.  Spletzer 


Abstract  In  this  work,  we  present  a  proof-of-concept  Automated  Asset  Locating 
System  (AALS)  for  enhancing  inventory  management.  AALS  integrates  LIDAR 
and  RFID  sensor  measurements  into  a  Rao-Blackwellized  particle  filter  for  simul¬ 
taneously  localizing  its  pose  with  the  positions  of  assets  in  the  environment.  We 
present  significant  experimental  results  where  the  proof-of-concept  system  success¬ 
fully  traveled  a  total  distance  of  1 .4  km  autonomously,  while  detecting  and  mapping 
all  143  available  assets  in  real-time,  and  with  a  mean  position  error  of  <  80  cm. 


1  INTRODUCTION 

Radio  Frequency  Identification  (RFID)  systems  use  radio  frequency  to  identify,  lo¬ 
cate  and  track  features  of  interest.  The  technology  sees  widespread  use  in  commer¬ 
cial  applications  to  include  baggage  handling,  passport  readers,  and  toll  collection 
to  name  but  a  few  [1].  There  are  several  RFID  variants:  passive,  semi-passive,  and 
active.  In  this  work,  we  limit  our  discussion  to  the  former.  A  passive  RFID  system  is 
composed  of  three  primary  components:  a  reader  (RF  transmitter/receiver),  a  passive 
tag,  and  a  host  computer.  The  tag  is  composed  of  an  antenna  coil  and  an  integrated 
circuit  that  contains  both  modulation  circuitry  and  non-volatile  memory.  The  tag  is 
energized  by  the  RF  carrier  signal  transmitted  by  the  reader.  Using  this  scavenged 
energy,  the  information  stored  on  the  tag  -  to  include  a  unique  identifier  for  that  tag 
instance  -  can  be  transmitted  back  to  the  reader  [2] .  The  strength  of  RFID  is  that 
it  explicitly  solves  the  data  association  problem.  As  each  tag  is  associated  with  a 
unique  identifier,  false  correspondences  across  tag  detections  are  eliminated. 

In  this  work,  we  investigate  the  potential  for  applying  RFID  and  robotics  tech¬ 
nologies  to  inventory  management  tasks.  Manual  intervention  in  material  tracking 
systems  is  labor  intensive,  costly,  and  error-prone  [3].  Furthermore,  low-frequency 
“scheduled  scanning”  approaches  cannot  ensure  that  inventory  remains  up-to-date. 
The  ability  to  automate  the  material  tracking  task  can  dramatically  enhance  as- 
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set  visibility.  To  this  end,  we  demonstrate  an  Automated  Asset  Locating  System 
(AALS)  that  integrates  LIDAR  and  RFID  sensing  on  a  mobile  robot  base  for  en¬ 
hanced  inventory  management.  The  RFID  system’s  role  is  dual  purpose.  First,  the 
tags  serve  to  identify  assets  to  be  tracked.  Second,  they  are  integrated  into  the  envi¬ 
ronment  as  correspondence-free  landmarks.  In  this  role,  they  effectively  introduce 
dramatic,  artificial  asymmetries  into  the  environment.  This  enables  reliable  robot 
localization  indoors  even  in  largely  symmetric  environments,  and  where  the  scale  of 
the  environment  was  large  compared  to  the  range  of  the  robot’s  sensors  -  conditions 
which  could  be  problematic  for  traditional  SLAM  and  localization  approaches.  The 
RFID  tag’s  extremely  compact  size  (^10-30  cm2  stickers)  and  low  cost  ($0.1- 1.0) 
allows  them  to  be  discretely  integrated  into  the  environment.  The  net  result  is  an 
automated  system  capable  of  reliably  locating  assets  in  the  environment. 

2  RELATED  WORK 

Several  researchers  have  investigated  the  convergence  of  robotics  and  RFID  tech¬ 
nologies.  Most  related  to  our  work  is  that  of  Hahnel  et  al  [4],  where  a  Pioneer  2 
Robot  equipped  with  a  Sick  LMS200  and  an  RFID  reader  was  manually  steered 
through  the  environment.  Using  a  map  generated  a  priori ,  the  authors  employed 
Monte-Carlo  localization  (MCL)  to  estimate  the  position  of  RFID  tags  in  the  envi¬ 
ronment.  Formal  results  on  tag  localization  accuracy  were  not  provided.  However, 
they  demonstrated  that  using  these  same  tags  as  landmarks,  robot  localization  could 
be  achieved  using  only  RFID  measurements  (although  not  to  the  same  level  of  ac¬ 
curacy  as  when  the  LIDAR  system  was  used).  Schneegans  et  al  [5]  built  on  this 
to  demonstrate  a  system  for  robot  localization  using  a  more  sophisticated  sensor 
model,  and  whereby  an  RFID  snapshot  was  associated  with  a  database  of  learned 
features.  They  compared  their  approach  with  those  from  [4],  and  found  comparable 
accuracy  in  the  end  position  estimate  of  the  robot,  but  a  significantly  faster  filter 
convergence  rate.  This  work  was  also  done  off-line. 

There  is  also  significant  work  that  has  emphasized  using  RFID  to  assist  in  local¬ 
ization  tasks.  Kulyukin  et  al  incorporated  RFID  into  a  robotic  assistant  for  the  visu¬ 
ally  impaired  [6].  Tsukiyama  demonstrated  a  limited  implementation  where  RFID 
tags  served  as  topological  landmarks  enabling  the  robot  to  correctly  follow  a  path 
[7].  Mapping  the  position  of  assets  was  not  considered.  Chae  and  Han  used  a  topo¬ 
logical  approach  with  RFID  and  a  vision  sensor  [8].  Experimental  results  were  again 
off-line.  Miah  and  Gueaieb  examined  using  tag  received  power  (TRP)  to  estimate 
the  distance  from  the  robot  to  the  tag  [9] .  However,  their  implementation  was  lim¬ 
ited  to  simulations.  Milella  et  al  developed  an  RFID-assisted  mobile  robot  system 
for  mapping  and  surveillance  using  fuzzy  inference  methods  [10].  In  terms  of  as¬ 
set  tracking  task,  Ehrenberg  et  al  investigated  the  use  of  a  LibBot  to  locate  books 
in  a  library  environment  [11].  They  localized  densely  packed,  short  range  tags  by 
again  employing  a  probabilistic  RFID  antenna  model.  The  actual  implementation 
was  rather  limited  however,  with  experiments  only  over  a  single  library  shelf. 

Our  work  differs  from  these  efforts  in  several  ways.  First,  we  employ  a  Rao- 
Blackwellized  particle  filter  for  the  simultaneous  localization  of  the  robot  pose  and 
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mapping  of  asset  positions  in  the  environment.  Second  and  more  significantly,  un¬ 
like  these  efforts  we  provide  significant  experimental  results  with  AALS  operating 
on-line.  In  our  experiments,  AALS  is  completely  responsible  for  its  own  navigation 
as  it  self-localizes  and  maps  the  positions  of  assets  in  the  environment.  These  results 
show  that  AALS  is  capable  of  reliably  detecting  and  mapping  the  position  of  assets 
in  the  environment  in  real-time. 


3  THE  DEVELOPMENT  PLATFORM 

The  AALS  proof-of-concept  system  was  built  upon  an  iRobot  Create  robotics  devel¬ 
opment  platform.  The  Create  is  an  excellent  low-cost  research  platform,  combining 
a  robust  mobile  chassis  with  a  higher  level  motor  control  interface  through  RS-232 
communication,  odometry  feedback,  limited  sensing,  and  5V  DC  power  output.  The 
other  primary  components  of  AALS  are: 

Computing  With  the  exception  of  motor  control  which  ran  on  the  Create’ s  embed¬ 
ded  computer,  all  computing  was  done  on  a  Lenovo  X200  laptop  with  a  2.4  GHz 
Core  2  Duo  processor  and  2  GB  memory. 

LIDAR.  The  primary  exteroceptive  sensor  for  AALS  was  a  Hokuyo  URG-04LX 
LIDAR.  The  URG-04LX  provides  a  240°  field  of  view  with  an  angular  resolution 
of  0.36°  .  It  offers  an  advertised  range  of  up  to  5.6  meters,  although  in  this  applica¬ 
tion  we  found  a  more  accurate  estimate  to  be  <4.5  meters. 

RFID.  The  RFID  transceiver  used  in  this  work  was  a  Skyetek  M9  operating  at  862- 
955  MHz.  We  deliberately  chose  an  UHF  module  to  maximize  range.  The  reader 
was  multiplexed  to  a  pair  of  antennae  oriented  to  maximize  detection  coverage  to 
the  front  and  sides  of  the  robot.  To  date,  all  development  has  been  done  using  the 
Alien  Technology  ALN-9534  Gen  2  tag.  In  an  evaluation  of  available  Gen  2  tags, 
this  model  provided  acceptable  detection  ranges  (up  to  4.0  meters)  while  exhibiting 
fairly  good  omnidirectional  performance  in  a  compact  footprint.  Images  of  AALS, 
showing  the  integration  of  on-board  computing,  the  URG-04LX,  the  M9,  multi¬ 
plexer,  and  antennae  are  at  Figure  1 . 


Fig.  1:  Top  and  side  profiles  of  AALS  showing  the  integration  of  Hokuyo  URG-04LX  LIDAR, 
RFID  reader,  and  on-board  computing. 
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4  ROBOT  LOCALIZATION  &  ASSET  TRACKING 

For  robot  localization  and  asset  tracking,  we  employed  a  Rao-Blackwellized  Particle 
Filter  (RBPF).  Such  approaches  were  first  introduced  to  the  robotics  community  by 
Doucet  et  al  [12],  who  observed  that  the  simultaneous  localization  and  mapping 
(SLAM)  problem  could  be  factored  into  two  sub-problems 

m 

n  p(i  i\xi..t,Zl..t)  (1) 

(=1 

where  x\mmt  denotes  the  robot  pose  over  time,  /i..m  the  m  landmark  positions,  z  the 
sensor  measurements,  and  u  the  control  inputs.  The  left  term  on  the  right  side  of  (1) 
corresponds  to  the  robot  localization  problem,  and  the  right  term  to  estimating  the 
position  of  m  conditionally  independent  landmarks  in  the  map.  This  partitioning  en¬ 
abled  the  robot  localization  problem  to  be  solved  using  a  traditional  particle  filtering 
approach,  while  allowing  the  mapping  problem  to  be  estimated  through  analytical 
methods.  The  significance  of  this  factorization  was  that  it  mitigated  the  otherwise 
exponential  increase  of  particle  samples  with  increases  in  state  space  dimension 
(/.<?.,  the  number  of  landmarks).  This  result  was  leveraged  by  Montemerlo  et  al  in 
developing  FastSLAM  [13],  where  mapping  was  accomplished  by  associating  m 
Extended  Kalman  Filters  (EKFs)  with  each  particle  to  independently  track  the  m 
landmarks  We  employed  a  similar  approach,  using  Monte-Carlo  Localization 
(MCL)  to  estimate  the  robot  pose,  and  Kalman  Filters  for  asset  tracking. 

4.1  Sensor  Model  Development 

The  LIDAR  Sensor  Model.  AALS  relies  heavily  upon  the  Hokuyo  URG-04LX 
for  localization.  The  URG-04LX  is  extremely  compact  and  lightweight  compared 
to  the  ubiquitous  Sick  LMS2xx  series  LIDARs,  which  made  it  well  suited  for  our 
proof-of-concept.  However,  they  are  also  myopic,  demonstrating  an  effective  range 
of  <4.5  meters  in  our  experiments.  Such  limited  range  can  be  a  challenge  for 
MCL  approaches,  which  solves  a  data  association  problem  relating  the  robot  pose 
[x(t),y(t),  0(t)]T  vs.  time  through  asymmetries  in  the  environment.  Our  develop¬ 
ment  site  consisted  of  two  building  wings  connected  by  a  corridor  «  40  meters  in 
length  with  little  asymmetry.  To  mitigate  the  potential  for  the  filter  converging  too 
quickly  (and  likely  incorrectly),  the  conditional  density  functions  which  model  the 
uncertainty  in  LIDAR  measurements  (and  are  used  to  weight  the  individual  parti¬ 
cles)  were  dramatically  smoothed.  As  a  result,  even  a  relatively  improbable  mea¬ 
surement  was  unlikely  to  penalize  a  particle  dramatically.  We  found  that  such  a 
PDF  would  ensure  that  the  robot’s  pose  would  eventually  converge  to  the  correct 
position/orientation  even  without  input  from  the  RFID  sensors  and  regardless  of  the 
initial  robot  pose. 

The  RFID  Sensor  Model.  Two  different  models  were  used  for  the  RFID  sensor  de¬ 
pending  upon  its  given  role.  The  primary  purpose  of  landmark  tags  (with  positions 
known  a  priori )  was  to  provide  a  low-cost  mechanism  for  enhancing  localization  ro- 
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Fig.  2:  (Left)  Weighting  function  for  landmark  tags  used  in  the  robot  localization  process.  (Right) 
PDF  for  asset  tag  detection  generated  empirically. 


bustness,  as  there  was  no  potential  for  data  association  errors.  We  considered  their 
ability  to  assist  in  pose  estimates  and  improve  filter  convergence  as  demonstrated 
in  [4]  of  secondary  importance.  Therefore,  we  assumed  no  relative  orientation  in¬ 
formation  was  available  and  a  symmetric  scaling  function  S  was  used  to  reflect  the 
likelihood  of  landmark  detection  by  the  robot.  To  model  this,  we  defined  a  critical 
radius  r*  around  each  landmark  where  detection  was  expected  based  upon  empirical 
results.  With  r*  so  defined,  the  weight  function  used  was 


S(i)  = 


(d(i)<r*)  +  (d(i)>r*)^2 


(2) 


where  d(i)  =  |  |(x,y)r  —  (x;,y*)r|  |  was  the  Euclidean  distance  from  the  robot  to  the 
ith  landmark.  When  used  in  conjunction  with  the  MCL  process,  particles  within  the 
critical  radius  of  a  detected  landmark  are  unaffected,  while  the  weights  of  those  out¬ 
side  are  scaled  inversely  proportional  to  the  squared  distance  to  the  landmark.  This 
is  illustrated  at  Figure  2  (left).  The  motivation  for  the  quadratic  model  is  the  Friis 
Transmission  Equation,  which  shows  that  the  power  ratio  between  receiving  and 
transmitting  antennae  are  inversely  proportional  to  their  distance  squared  [14].  The 
placement  of  only  several  landmark  tags  in  the  environment  dramatically  acceler¬ 
ated  particle  filter  convergence  during  our  experiments. 

For  asset  detection,  we  assumed  that  the  estimated  robot  pose  was  approximately 
correct.  As  such,  the  sensor  model  was  directional  to  reflect  the  relative  robot/asset 
tag  orientation.  We  initially  generated  a  discrete  PDF  model  empirically  by  collect¬ 
ing  detection  data  as  a  function  of  tag  position,  orientation,  and  height  as  in  [5].  The 
resulting  two-dimensional  PDF  estimate  in  the  antenna  frame  is  shown  at  Figure  2 
(right).  The  PDF  is  highly  non-Gaussian,  and  does  not  lend  itself  to  a  Kalman  filter 
implementation.  However,  in  reality  this  model  -  as  well  as  those  typically  used 
in  related  work  -  is  ad-hoc.  Antenna  performance  is  strongly  environment  specific. 
Signal  is  strongly  tied  to  reflections  from  the  floor,  walls,  ceiling,  obstacles,  signal 
absorption,  the  amount  of  metal  in  the  environment,  tag  line-of-sight,  the  object  to 
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which  the  tag  is  affixed,  etc. .  In  fact,  in  preliminary  testing  we  compared  a  voting 
approach  based  upon  our  discrete  PDF  model  with  a  pure  Kalman  filter  using  an 
overly  conservative  approximation  of  this  PDF.  The  latter  demonstrated  equal  or 
better  performance,  and  as  such  we  ultimately  employed  such  an  approach. 

4.2  Robot  Localization  &  Asset  Position  Estimation 

For  the  most  part,  robot  localization  was  accomplished  using  a  traditional  MCL 
approach  [15].  The  time  update  phase  corresponded  to  the  transformation  of  the 
particles’  poses  using  a  unicycle  model  for  robot  motion.  Measurement  updates 
using  the  LIDAR  were  also  straightforward.  However,  an  additional  measurement 
update  stage  was  integrated  for  whenever  a  landmark  tag  was  detected.  In  this  event, 
samples  were  re-weighted  based  upon  (i)  =  S(j)wk(i)  where  Wk(i)  denotes  the 
current  weight  of  the  ith  particle  at  time-step  k ,  and  S(j)  the  scaling  function  defined 
by  (2).  After  re-weighting,  the  particle  set  was  re-sampled.  The  net  effect  was  that 
particles  far  away  from  landmark  j  were  quickly  killed  off. 

With  the  ability  to  reliably  localize  the  robot,  we  turn  to  the  case  of  map¬ 
ping  assets.  To  this  end,  each  particle  /  =  \  ...n  ,  in  our  RBPF  maintains  a 
Kalman  filter  that  propagates  an  estimate  for  the  position  and  positional  covariance 
{x(i,  j) ,  Z  (/,  j) },  j  =  1 . . .  m,  for  each  of  the  m  assets  detected.  Note  that  RFID  asset 
detections  are  not  used  to  refine  the  robot  pose  estimate,  so  the  asset  position  esti¬ 
mates  remain  uncorrelated.  As  a  result,  only  n  of  the  mn  total  Kalman  filters  need 
be  updated  for  a  given  asset  detection. 

We  model  each  RFID  asset  detection  as  a  direct  estimate  of  the  asset’s  position, 
i.e.,  z  =  wTaXa  where  xa  is  the  tag  position  estimate  in  the  antenna  frame,  and  wTa 
maps  points  from  the  antenna  frame  to  world  frame.  The  associated  measurement 
covariance  is  then  Zr  =  R(Gr  +  0a)ZaR(0r  +  0a)t  where  Za  denotes  the  estimated 
uncertainty  in  the  antenna  frame,  and  R  is  a  2-D  rotation  matrix  associated  with  the 
robot  Or  and  antenna  Oa  orientations  in  the  world  and  robot  frames,  respectively.  The 
measurement  update  is  then  textbook  Kalman  Filter,  and  since  the  asset  position  is 
assumed  static  there  is  no  process  update. 


5  EXPERIMENTAL  RESULTS 
5.1  Component  Level  Testing 

As  part  of  the  proof-of-concept,  we  performed  component  level  testing  to  determine 
the  robustness  of  tag  detection  as  a  function  of  tag  density.  Of  concern  was  the  po¬ 
tential  for  message  collisions  if  multiple  irradiated  tags  attempted  to  transmit  at  the 
same  time.  To  this  end,  we  examined  both  linear  arrays  of  5-15  tags,  and  grid  arrays 
of  12  tags  (3  x  4)  with  inter-tag  spacings  ranging  from  0-45  cm.  This  also  included 
different  heights  above  the  ground  plane.  A  representative  linear  array  configura¬ 
tion  with  10  cm  spacing  is  shown  at  Figure  3  (left).  For  each  test  geometry,  AALS 
was  driven  multiple  times  past  the  tag  array  at  standoff  distances  consistent  with  an 


Automated  Asset  Locating  System  (AALS) 


7 


expected  detection  based  upon  the  sensor  model  derived  in  Section  4.1.  A  tag  was 
considered  detected  if  it  was  successfully  identified  at  least  one  time  while  AALS 
traversed  the  array.  Summary  statistics  are  shown  at  Figure  3  (center). 


Fig.  3:  (Left-Center)  Sample  RFID  linear  array  used  during  component  level  testing.  All  configu¬ 
rations  demonstrated  at  least  a  93%  success  rate.  (Right)  Sample  asset  configuration  during  system 
level  testing. 

There  were  908  true  positives,  17  false  negatives,  and  0  false  positives.  Sixteen  of 
the  17  false  negatives  were  with  grid  arrays  with  inter- tag  spacing  of  5  cm  (14)  and 
15  cm  (2).  These  corresponded  to  tag  densities  of  100  and  30  tags/m2,  and  detection 
rates  were  93%  and  96%,  respectively.  These  results  indicate  that  the  anti-collision 
protocols  employed  by  the  system  worked  very  well  for  the  range  of  geometries 
tested  even  under  very  high  tag  densities. 


5.2  System  Level  Testing 

To  demonstrate  the  system  level  proof-of-concept,  we  conducted  a  series  of  exper¬ 
iments  using  the  fourth  floor  of  Packard  Laboratory  at  Lehigh  University  as  the 
development  site.  This  constituted  a  region  ^  48  x  14  meters.  Our  map  M  repre¬ 
sentation  was  an  occupancy  grid  with  a  cell  resolution  of  10  cm,  and  was  provided 
to  AALS  a  priori.  The  map  was  constructed  from  digital  blue  prints.  While  nomi¬ 
nally  correct,  there  were  significant  inconsistencies  between  this  map  and  the  actual 
floorplan.  Only  the  most  serious  of  these  were  corrected.  One  final  alteration  to  M 
included  the  introduction  of  4  landmark  tags  with  positions  also  known  a  priori  by 
the  robot.  These  were  spaced  approximately  every  15  meters  in  our  corridor  set. 
Finally,  10-15  assets  (/.<?.,  card  board  boxes  and  plastic  bins  with  tags  affixed)  were 
placed  in  random  locations  throughout  the  environment.  A  representative  configu¬ 
ration  is  at  Figure  3  (right). 

For  global  path  planning,  AALS  was  provided  a  route  network  graph  G(V,E) 
that  delineated  in  continuous  space  the  intended  paths  for  navigation.  Waypoints  in 
the  route  network  corresponded  to  vertices  v/  E  V  of  G,  and  the  edge  set  EcG  corre¬ 
sponded  to  path  segments  where  each  E  E  connected  a  pair  of  waypoints  (v*,  Vj). 
The  desired  path  for  a  given  mission  was  then  specified  via  a  waypoint  sequence 
(v;,y/,. . .  ,vw).  For  motion  planning,  AALS  relied  upon  2  modes:  obstacle  avoid¬ 
ance,  and  path  following.  Prior  to  particle  filter  convergence  or  in  the  event  that  the 
specified  route  segment  was  blocked,  AALS  would  operate  in  obstacle  avoidance 
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mode.  For  path-following,  a  PD  controller  was  used  where  the  normal  distance  to 
the  current  route  segment  was  employed  as  an  error  metric.  The  typical  mission  for 
AALS  entailed  a  complete  circuit  of  the  test  area.  This  corresponded  to  a  mission 
length  of  «  125  meters. 


Fig.  4:  Mission  results  showing  the  actual  (blue  “*”)  and  estimated  (red  “numbers”)  asset  locations. 
The  mean  position  error  in  this  trial  was  54  cm. 


After  preliminary  testing  to  characterize  the  system,  a  total  of  12  missions  were 
conducted.  During  these  trials,  the  starting  point  was  varied,  as  were  the  position 
and  orientation  of  assets.  This  ensured  that  asset  detection  and  mapping  was  pos¬ 
sible  with  tag  orientations  parallel  and  orthogonal  to  the  robot  path.  The  geometry 
changes  were  also  done  to  ensure  that  the  sensor  model  for  the  Kalman  filter  was  not 
deliberately  biased.  For  each  mission,  AALS  drove  at  a  nominal  linear  velocity  of 
0.3  m/s.  At  the  initiation  of  each  trial,  10,000  particles  were  used  to  instantiate  the 
prior  for  the  robot  pose.  This  number  was  reduced  dynamically  to  as  few  as  several 
hundred  particles  using  the  second-order  statistics  to  infer  convergence  of  the  parti¬ 
cle  set.  To  further  support  real-time  computation,  LIDAR  range  measurements  were 
sub-sampled  to  an  angular  resolution  of  1 .08°  .  The  target  update  rate  for  AALS  was 
2  Hz.  At  the  conclusion  of  a  given  mission,  the  estimate  for  the  position  of  assets 
was  determined  from 


.  i  —  1 . . .  m  (3) 

ij 

where  [x,y]J  denotes  the  position  of  the  ith  landmark,  [x,y]Jj  the  \th  landmark  posi¬ 
tion  as  estimated  by  the  Kalman  filter  of  the  ]th  particle,  and  Wj  is  the  corresponding 
sample  weight  at  mission  completion.  Results  from  a  representative  mission  are  at 
Figure  4.  This  shows  the  route  network  (green  lines),  the  path  as  estimated  by  the 
robot  (red  dashed  lines),  the  position  of  landmark  tags  (yellow  circles),  and  the  ac¬ 
tual  (blue  “*”)  and  estimated  (red  “numbers”)  positions  of  assets. 

Of  the  12  missions,  11  were  completed  successfully.  The  one  failure  occurred 
when  an  asset  was  deliberately  placed  across  the  path.  The  motion  planner  incor¬ 
rectly  determined  the  path  was  not  traversable,  and  aborted  the  mission.  The  motion 
planner  was  subsequently  modified,  and  this  same  configuration  was  successfully 
re-tested.  The  1 1  completed  missions  constitute  a  total  distance  traveled  of  1 .4  km. 
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During  this  time,  all  143  assets  that  were  placed  in  the  environment  were  detected. 
The  estimated  asset  positions  were  then  compared  with  hand-measured  ground-truth 
values.  Statistics  for  the  different  configurations  are  shown  at  Table  1.  Border  and 
interior  configurations  discriminate  as  to  whether  the  asset  was  located  on  the  map 
border  or  in  the  interior.  Parallel/normal  to  path  refers  to  the  antenna  orientation 
with  respect  to  the  robot’s  primary  direction  of  travel. 


Asset  Configuration 

Number  Samples 

Number  Detected 

MPE  (cm) 

<T  (cm) 

All 

143 

143 

79.2 

49.5 

Border 

130 

130 

79.4 

49.8 

Interior 

13 

13 

77.1 

48.3 

Parallel  to  path 

91 

91 

86.8 

55.5 

Normal  to  path 

52 

52 

65.8 

33.3 

Table  1:  Mean  Position  Error  (MPE)  for  detected  assets  as  a  function  of  geometry. 


From  these,  we  see  that  the  average  position  error  was  <80  cm.  There  was  little 
difference  between  assets  that  were  located  within  the  interior  or  along  the  border  of 
the  map  (we  should  note  that  no  optimizations  were  done  to  asset  location  estimates 
that  were  outside  the  boundary  of  the  map,  which  would  have  improved  results).  We 
do  note  a  fairly  significant  difference  between  tag  orientations  that  were  normal  vs. 
parallel  to  the  robot’s  direction  of  travel.  This  appears  to  be  attributed  to  the  normal 
antennae  being  detected  at  longer  standoff  distances,  and  the  associated  Kalman 
filters  seeing  a  larger  number  of  measurement  updates.  However,  further  analysis  is 
needed  to  support  this  hypothesis.  We  should  note  that  in  a  warehouse  or  similar 
environment  where  such  a  system  would  be  used,  tag  orientation  would  typically  be 
parallel  to  the  direction  of  travel  and  as  such  these  errors  are  more  representative. 

For  portions  of  three  trials,  we  also  estimated  robot  position  using  a  Sick 
LMS291-S14  to  track  a  retro-reflector  affixed  to  the  robot.  Using  this  as  ground 
truth,  the  mean  absolute  position  error  of  the  robot  localization  system  was  53.3  cm 
(<jx  =  49.3  cm,  <7y=19.1  cm).  The  bias  was  not  surprising  due  to  the  strong  symmetry 
and  limited  configuration  space  in  the  v  and  y  directions,  respectively.  Taking  these 
findings  into  consideration,  a  more  accurate  estimate  of  tag  localization  performance 
would  be  a  MAE  of  ~60  cm. 


6  DISCUSSION 

In  this  work,  we  demonstrated  a  proof-of-concept  Automated  Asset  Locating  Sys¬ 
tem  (AALS)  that  integrates  LIDAR  and  RFID  sensing  on  a  mobile  robot  base.  The 
RFID  system’s  role  was  dual  purpose  in  this  application  -  identifying  both  asset 
and  landmarks  tags  in  close  proximity  to  the  robot  platform.  These  measurements 
enabled  the  position  of  asset  tags  in  the  environment  to  be  estimated  with  a  mean 
error  of  <80  cm.  Furthermore,  they  were  able  to  augment  the  limited  range  of  the 
Hokuyo  URG-04LX  by  not  only  accelerating  the  filter’s  convergence  rate,  but  also 
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ensuring  against  divergence  in  areas  of  low  feature  asymmetry.  A  natural  question 
regarding  this  approach  is  the  use  of  MCL  vs.  SLAM.  This  decision  was  made  so 
that  landmark  tags  with  known  “absolute”  positions  in  the  map  could  readily  be  in¬ 
tegrated  to  protect  against  localization  failures  ( e.g .,  incorrect  loop  closures).  We  are 
currently  investigating  a  hybrid  approach  which  integrates  both  aspects,  and  work¬ 
ing  with  members  of  the  NSF  Center  for  Engineering  Logistics  and  Distribution  to 
evaluate  AALS  in  a  larger  scale,  representative  environment. 
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Active  SLAM  and  Loop  Prediction  with  the 
Segmented  Map  Using  Simplified  Models 


Nathaniel  Fairfield  and  David  Wettergreen 


Abstract  We  previously  introduced  the  SegSLAM  algorithm,  an  approach  to  the 
simultaneous  localization  and  mapping  (SLAM)  problem  that  divides  the  environ¬ 
ment  up  into  segments,  or  submaps,  using  heuristic  methods.  We  investigate  a  real¬ 
time  method  for  Active  SLAM  with  SegSLAM,  in  which  actions  are  selected  in 
order  to  reduce  uncertainty  in  both  the  local  metric  submap  and  the  global  topolog¬ 
ical  map.  Recent  work  in  the  area  of  Active  SLAM  has  been  built  on  the  theoretical 
basis  of  information  entropy.  Due  to  the  complexity  of  the  SegSLAM  belief  state, 
as  encoded  in  the  SegMap  representation,  it  is  not  feasible  to  estimate  the  expected 
entropy  of  the  full  belief  state.  Instead,  we  use  a  simplified  model  to  heuristically 
select  entropy -reducing  actions  without  explicitly  evaluating  the  full  belief  state.  We 
discuss  the  relation  of  this  heuristic  method  to  the  full  entropy  estimation  method, 
and  present  results  from  applying  our  planning  method  in  real-time  onboard  a  mo¬ 
bile  robot. 


1  Introduction 

The  tasks  of  mapping,  localization,  and  planning  lie  at  the  core  of  mobile  robotics, 
and  to  a  large  degree  have  been  solved  for  small,  two  dimensional,  structured  envi¬ 
ronments.  To  make  robots  useful  in  the  broader  world,  they  need  to  move  beyond 
such  simple  environments  into  large,  3D,  unstructured  environments. 

In  response,  there  has  been  recent  work  in  the  SLAM  field  in  two  areas:  submap 
SLAM  and  active  SLAM.  Most  submap  SLAM  methods  use  a  combination  of  met¬ 
ric  and  topological  maps,  in  which  the  relationships  between  submaps  are  repre¬ 
sented  by  the  edges  of  a  graph,  and  the  nodes  of  the  graph  represent  the  submaps. 
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Ground  truth 


Segmented  Map  Map  samples  Loop  detection  via  matching 


Fig.  1  This  figure  illustrates  the  process  of  segmentation,  map  sample  generation,  and  matching. 
The  segmented  map  stores  the  particle  submaps  (shown  as  different  arrows)  for  each  color-coded 
segment.  The  segmented  map  also  stores  the  relationships  between  segments,  loosely  illustrated 
here  by  the  segment  placements  relative  to  each  other.  Note  that  after  matching,  the  breadth-first 
map  sampling  algorithm  does  not  enforce  global  consistency  between  the  red  and  turquoise  seg¬ 
ments. 


The  submap  segmentation  is  usually  designed  such  that  their  scale  is  well  within 
the  capabilities  of  a  SLAM  approach.  Thus  the  scaling  problem  is  addressed,  but 
the  submap  algorithm  must  manage  the  graph  of  submaps,  deciding  when  to  cre¬ 
ate  a  new  submap,  when  to  re-enter  an  old  submap,  and  how  to  represent  different 
hypotheses  about  the  topological  relationships  between  submaps.  In  prior  work,  we 
have  presented  a  robust,  real-time,  submap-based  approach  called  SegSLAM  [Fair- 
field,  2009] .  In  our  SegSLAM  algorithm,  individual  submaps  are  accurate  3D  met¬ 
ric  evidence  grid-based  maps.  Using  an  extension  of  the  Rao-Blackwellized  Particle 
Filter  (RBPF)  formulation  [Doucet  et  al.,  2000],  SegSLAM  maintains  a  stochas¬ 
tic  graph  of  the  transformations  between  submaps,  called  the  segmented  map  or 
SegMap  (Figure  1).  The  particles  of  a  regular  RBPF  are  a  discrete  approximation 
to  the  distribution  over  poses  and  the  metric  maps;  the  particles  of  SegSLAM  are  a 
discrete  approximation  to  the  distribution  over  poses  and  submaps,  where  the  poses 
are  in  the  local  coordinate  frame  of  the  submaps.  Since  each  SegSLAM  particle  has 
its  own  copy  of  each  submap,  we  use  the  noun  segment  to  refer  to  the  collection  of 
particle  submaps  that  are  temporally  compatible:  unlike  RBPF  particles,  SegSLAM 
particles  do  not  encode  a  complete  trajectory  hypothesis,  instead  the  trajectory  must 
be  reconstructed  by  stitching  together  compatible  segments. 

Active  SLAM  determines  the  robot’s  actions  by  planning  based  on  the  SLAM 
uncertainty.  Uncertainty,  which  can  be  quantified  as  information  entropy ,  depends 
on  the  route  that  the  robot  uses  to  explore  the  environment.  A  well-known  approach 
for  balancing  the  need  to  explore  new  regions  against  the  need  to  localize  is  to  es¬ 
timate  the  expected  information  gain  in  the  SLAM  state  distribution  resulting  from 
different  actions  [Feder  et  al.,  1999,  Bourgault  et  al.,  2002,  Burgard  et  al.,  2005]. 
However,  it  is  rarely  possible  to  perform  the  estimates  in  closed  form,  so  compu¬ 
tationally  expensive  Monte-Carlo  simulations  are  used.Rather  than  attempting  to 
estimate  the  expected  information  gain  of  the  full  SLAM  state,  we  show  how  a  sim¬ 
plified  sensor  model  can  be  used  to  probe  the  SLAM  belief  state,  and  how  the  results 
of  these  probes  can  be  then  used  to  select  plans  that  reduce  the  SLAM  uncertainty. 
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A  goal  of  Active  SLAM  is  to  detect  and  close  loops:  when  the  robot  returns  to  a 
place  that  it  has  previously  mapped  and  recognizes  that  it  is  back  in  that  place,  it  can 
correct  for  all  of  the  error  that  has  accumulated  since  it  left.  For  example,  Stachniss 
et  al.  [2004]  perform  active  loop  closure  by  searching  for  discrepancies  between  the 
metric  distance  (which  is  short  near  a  loop  closure)  and  topological  distance  (which 
is  long  just  before  a  loop  closure).  However,  active  loop  closure  doesn’t  necessarily 
predict  loops,  it  just  takes  advantage  of  them  once  the  robot  has  already  observed 
a  possible  loop.  Our  approach  extends  to  true  loop  prediction,  attempting  to  exploit 
the  local  structure  of  the  environment  by  using  the  submaps  of  SegSLAM’s  seg¬ 
mented  map  (SegMap)  as  a  predictive  model.  This  is  related  to  the  work  of  Nabbe 
et  al.  [2004],  who  use  “hallucination”  to  fill  in  gaps  in  a  sparse  map  to  assist  in  esti¬ 
mating  path  costs  for  navigation  over  large-scale  terrains.  Also  related  is  the  work  of 
Chang  et  al.  [2007],  which  relies  heavily  on  repetitive  structure  in  the  environment 
to  match  snippets  from  the  current  map  to  partially  observed  areas,  with  the  idea 
of  not  exploring  regions  that  can  be  predicted  or  of  using  the  prediction  for  local¬ 
ization,  a  questionable  strategy.  Although  not  used  for  prediction,  Fox  et  al.  [2003] 
have  the  most  robust  approach  to  building  a  model  of  the  world,  learning  a  Dirichlet 
prior  over  structural  models  from  a  library  of  previously  explored  environments,  and 
using  samples  from  this  distribution  for  estimating  the  probability  that  the  robot  is 
inside  or  outside  the  known  map. 

We  begin  with  a  description  of  the  information  entropy  formulation  for  Active 
SLAM,  as  well  as  some  of  the  difficulties  of  explicitly  reasoning  about  information 
gain.  We  then  describe  our  method  for  using  simplified  models  to  probe  the  SegMap 
to  find  actions  than  reduce  entropy,  including  the  use  of  the  segmented  map  as  a 
predictive  model  for  generating  routes  that  extend  into  unknown  regions.  Finally,  we 
demonstrate  Active  SLAM  with  predictive  loop  closure  in  a  real  world  experiment. 


2  Information  Gain-Based  Active  SLAM 


Information  entropy  methods  for  Active  SLAM  involves  several  steps.  To  select 
an  action,  an  information  gain  planner  evaluates  different  possible  actions  based 
on  their  expected  information  gain.  For  SLAM,  this  information  gain  is  over  the 
SLAM  belief  distribution,  p(x,  0),  which  includes  both  the  pose  v  and  map  6.  The 
RBPF  SLAM  belief  distribution  is  represented  by  a  set  of  particles,  each  particle 
representing  both  a  map  and  a  position  within  that  map.  As  a  result,  the  RBPF 
SLAM  entropy  can  be  factorized  as 

H[p(X,©)}  =  H(p(X))  +Ex[H(p(@\X))} 

To  estimate  Ex[H[p(0 \X)]\,  we  take  the  average  entropy  of  all  the  particle  maps. 
We  use  an  efficient  3D  evidence  grid  data- structure  called  the  Deferred  Reference 
Count  Octree  [Fairfield  et  al.,  2007],  and  under  the  independence  assumptions  of 
evidence  grids,  the  information  entropy  H  of  of  a  map  6  is  the  sum  of  the  entropies 
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of  each  voxel  0[x,y,z], 

H{9)=  £  -plog(p)-(l-p)log(l-p), 

where  p  =  p(6[x,y,z]).  Since  the  octrees  are  sparsely  populated,  ignoring  unknown 
regions  changes  from  a  requirement  to  a  computational  advantage.  A  further  speed 
improvement  arises  because  the  octree  can  represent  homogeneous  regions  at  a  va¬ 
riety  of  leaf  node  scales. 

Computing  the  entropy  of  the  particle  poses  is  approximated  with  heuristics.  The 
method  described  by  Stachniss  [2006]  approximates  the  cloud  with  a  multivariate 
normal  distribution,  and  then  computes  the  differential  entropy  of  the  distribution. 
This  approximation  is  only  accurate  in  cases  in  which  the  pose  cloud  is  nearly  uni- 
modal,  although  it  will  usefully  report  high  entropy  for  multi-modal  distributions. 
Roy  and  Thrun  [1999]  and  Stachniss  [2006]  refine  this  heuristic  by  computing  its 
average  value  over  multiple  points  in  time,  though  Blanco  et  al.  [2008]  describes 
how  even  this  improved  metric  can  become  undefined  after  closing  a  loop.  This 
is  also  more  complicated  for  SegSLAM,  because  particles  may  be  distributed  over 
different  segments,  each  of  which  has  its  own  coordinate  frame.  A  more  mundane 
problem  is  that  the  entropy  of  the  maps  dominates  the  entropy  of  the  poses,  meaning 
that  a  small  amount  of  exploration  will  result  in  a  greater  change  in  entropy  than  a 
convergence  in  the  particle  poses. 

A  reasonable  simplification  is  to  restrict  the  problem  to  finding  the  best  of  a  set 
of  candidate  actions  rather  than  finding  the  best  of  all  possible  actions.  The  set  of 
candidate  actions  can  be  generated  to  have  the  actions  differ  significantly  and  be 
distributed  evenly  over  the  space  of  all  possible  actions.  Unfortunately,  even  when 
only  considering  a  few  candidate  actions,  this  process  is  computationally  intractable 
for  a  RBPF  with  hundreds  of  particles  and  complex  map  structures.  For  example, 
the  expected  entropy  estimation  process  is  further  complicated  for  SegSLAM,  since 
the  average  is  computed  over  map  samples,  and  there  are  many  more  possible  map 
samples  than  particles. 

Further,  a  fundamental  limitation  of  this  process  is  that  measurements  can’t  be 
predicted  for  unmapped  areas.  It  is  also  questionable  whether  updating  the  (cloned) 
particle  maps  with  the  simulated  measurements  yields  a  better  estimate  of  the  SLAM 
state.  As  a  result,  the  accuracy  of  the  entropy  estimates  will  degrade  over  time:  in 
particular,  they  will  degrade  completely  for  actions  that  enter  unmapped  areas. 

In  the  next  two  sections  we  address  the  two  limitations  of  information  gain-based 
Active  SLAM:  computational  intractability  and  limited  horizon.  First,  we  describe 
our  simplified  method,  and  then  we  describe  how  to  use  prediction  to  usefully  extend 
the  planning  horizon. 
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3  Simplified  Entropy  Heuristic 


In  order  to  be  computationally  tractable,  our  method  eliminates  the  filter  simulation 
and  entropy  evaluation  and  instead,  directly  examines  the  variance  of  the  simulated 
measurements.  Under  certain  constraints,  this  variance  is  a  proxy  for  the  full  entropy 
and  can  be  used  for  Active  SLAM. 

As  before,  the  simulated  measurements  zap  are  generated  by  simulating  the  mo¬ 
tion  of  particle  p  while  executing  action  a ,  and  casting  rays  in  the  map.  The  simu¬ 
lated  measurements  are  determined  by  the  current  SLAM  belief  state,  and  are  cor¬ 
related  with  the  particle  weights  at  least  until  the  particle  filter  would  resample  (if 
we  were  simulating  the  full  particle  filter). 

For  example,  assuming  a  ID  state  and  that  the  particle  pose  distribution  can  be 
approximated  by  a  Gaussian,  we  can  estimate  the  entropy  of  the  particle  poses  for 
action  a  as  the  differential  entropy  of  the  Gaussian 

Ha  =  \og{oaV27te), 

where  oa  is  the  weighted  variance  of  the  particle  positions 

#p 

Gq  =  %/\^wap(xp  ~  AO  )• 

P 

Note  that  the  (xp  —  p)2  are  constant  for  all  actions,  because  all  particles  move  in 
lock-step  under  the  simplified  model.  Using  a  Gaussian  model  for  a  single  range 
sensor,  the  particle  weights  are  given  as 


vqp 


-izap-zY 
7  2(7? 


\/2kg2 


Plugging  in  the  expression  for  wap  and  discarding  constants,  we  have 

Ha  oc  log(^^^-«2) 

p 


and  since  both  log  and  exp  are  monotonic  functions,  if  we  can  minimize  Ha  by 
choosing  the  action  that  maximizes  (zap—z)2,  or  equivalently  to  choosing  the  action 
that  maximizes  the  variance  of  the  simulated  measurements:  the  planning  algorithm 
only  needs  to  estimate  the  relative  variances  to  select  between  different  actions. 
Note  that  variance  in  the  simulated  measurements  arises  from  the  uncertainty  in  the 
SLAM  belief  state  and  is  indicative  of  the  SLAM  entropy,  rather  than  the  likelihood 
of  any  particular  set  of  measurements  under  the  sensor  model. 

Within  the  particle  filter’s  resampling  horizon,  choosing  the  action  that  maxi¬ 
mizes  the  sum- squared  measurement  difference  is  equivalent  to  choosing  the  action 
that  minimizes  the  entropy.  Even  beyond  the  resampling  horizon,  when  some  of  the 
assumptions  above  break  down,  the  measurement  variance  criteria  leads  to  good  ac- 
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Fig.  2  Left,  a  diagram  of  a  SegMap  with  two  particles  and  three  segments:  the  transforms  be¬ 
tween  segments  are  represented  by  circles.  Center,  three  different  map  samples  generated  from  the 
SegMap.  Right,  a  map  sample  is  extended  by  three  guesses  (in  yellow),  which  are  transformed 
segments  attached  at  the  current  vehicle  position  with  a  small  random  perturbation. 


tion  recommendations.  Because  we  do  not  need  to  precisely  simulate  the  full  state 
of  the  SLAM  algorithm,  we  can  use  simplified  sensor  models  and  larger  motion  step 
sizes,  yielding  a  smoothly  degrading,  computationally  fast,  proxy  for  the  full  SLAM 
entropy  estimate. 


4  Useful  Map  Prediction 


The  overall  goal  of  Active  SLAM  is  to  find  actions  that  reduce  the  uncertainty  in 
both  mapping  and  localization.  Closing  loops  is  crucial  for  SLAM,  because  when 
the  algorithm  detects  that  it  has  returned  to  a  previously  mapped  area,  the  accumu¬ 
lated  error  can  be  canceled.  It  would  be  very  beneficial  if  Active  SLAM  could  select 
actions  that  were  expected  to  close  loops:  the  difficulty  is  that  making  any  non-trivial 
loop-closure  predictions  involves  simulating  measurements  in  unmapped  regions. 
Rather  than  assuming  all  unknown  space  is  empty,  or  occupied,  or  using  some  other 
prior  model,  our  method  directly  predicts  the  structure  of  the  environment  in  the 
unmapped  area  by  using  nearby  previously  constructed  map  segment. 

In  the  SegMap  there  are  multiple  particle  maps  for  each  segment,  or  submap 
(Figure  2).  Different  particle  maps  can  be  stitched  together  to  yield  metric  map 
samples.  The  SegMap  represents  the  SLAM  belief  distribution,  in  the  sense  that 
for  regions  where  SegSLAM  has  low  uncertainty  different  map  samples  will  be 
very  similar,  but  for  regions  where  SegSLAM  is  uncertain,  variation  in  the  metric 
submaps  and  the  transformations  between  them  will  yield  significant  variation  in 
the  map  samples. 

To  simulate  measurements  we  generate  metric  map  samples  that  include  each 
particle’s  current  segment,  simulate  the  motion  of  the  vehicle  within  the  map  sam¬ 
ple,  and  cast  rays.  We  can  extend  map  sample  generation  to  predict  unmapped  re- 
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gions  by  making  the  assumption  that  nearby  segments  are  good  predictors  of  unob¬ 
served  areas  (Figure  2).  We  use  the  same  process  as  for  generating  map  samples, 
with  the  addition  of  a  step  that  randomly  selects  a  nearby  segment,  and  grafts  its 
start  position  onto  the  current  position  with  some  random  perturbation  or  refine¬ 
ment  from  map  matching.  We  will  call  this  grafted  segment  a  “guess,”  because  it  is 
a  weaker  hypotheses  about  the  map  structure. 

The  question  is  then  how  to  make  use  of  the  “guess”  yielded  by  the  grafted  seg¬ 
ment.  If  the  guess  conflicts  with  the  “known”  portions  of  the  map  sample  due  to  the 
regular  segments,  it  is  useless.  If  the  guess  extends  into  unmapped  regions,  it  can  be 
used  to  generate  informed  plans  (for  example  allowing  plans  to  extend  down  a  hall¬ 
way).  But  the  real  value  of  a  guess  is  when  it  connects  known  regions  while  bridging 
across  unknown  regions:  this  allows  Active  SLAM  to  close  predicted  loops. 

To  accelerate  the  process  of  finding  plans,  our  planner  also  uses  the  following 
simplifications.  It  first  generates  a  local  map  sample  and  fuses  the  submaps  into  a 
single  evidence  grid  map.  It  then  generates  a  guess  by  selecting  a  random  segment 
to  append  to  the  vehicle’s  current  position,  as  well  as  a  small  random  perturbation 
(Figure  2).  This  guess  segment  includes  the  vehicle’s  original  trajectory  through  the 
segment,  so  to  quickly  check  the  plausibility  of  the  guess,  our  planner  queries  the 
fused  map  along  the  (transformed)  trajectory  to  verify  that  the  guessed  trajectory 
starts  in  known  empty  space,  goes  into  unknown  space,  and  returns  to  known  empty 
space.  If  the  guess  passes  this  quick  check,  it  is  grafted  into  the  fused  map,  and  the 
planner  uses  the  vehicle  model  to  see  if  the  vehicle  can  pass  through  the  resulting 
map.  If  the  vehicle  can  go  from  known  empty  space  to  grafted  empty  space  and 
back  to  known  empty  space  without  collision,  the  plan  is  considered  a  success  for 
the  map  sample. 

In  the  next  section,  we  present  from  using  this  approach  to  Active  SLAM  with 
loop  prediction  in  a  real-world  experiment. 


5  Active  SLAM  Experiment 


Beneath  the  Field  Robotics  Center  highbay  is  a  network  of  tunnels  known  as  the 
catacombs.  The  navigable  area  of  the  catacombs  forms  a  square  figure  eight  shape 
(Figure  3).  We  used  Cave  Crawler  [Morris  et  al.,  2006]  to  explore  the  tunnels,  which 
are  are  just  wide  enough:  in  some  cases  the  wheels  rub  on  both  sides,  so  we  manually 
drove  Cave  Crawler  in  a  close  approximation  of  the  planned  trajectory.  We  limited 
the  laser  ranges  to  a  maximum  of  4  m  so  that  returns  would  not  reach  the  end  of  the 
tunnel  segments.  As  the  simplified  sensor  model  for  this  Active  SLAM  experiment, 
we  used  a  simple  binary  collision  model  that  indicated  traversability. 

Cave  Crawler  started  at  one  end  of  the  figure  eight,  and  used  an  RRT  planner  to 
find  exploration  actions  with  a  bias  toward  moving  in  straight  lines.  After  driving 
down  one  side,  past  the  crossing  tunnel  and  around  the  bottom  of  the  loop,  Cave 
Crawler  encountered  the  other  end  of  the  crossing  tunnel  (Figure  4).  At  this  point, 
it  had  three  or  four  segments  (varying  between  different  runs)  with  which  to  make 
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Fig.  3  Overview  of  the  catacombs  environ¬ 
ment  -  the  long  axis  of  the  figure  eight  is  about 
20  m,  and  the  short  axis  is  about  10  m.  In  sev¬ 
eral  places  Cave  Crawler  can  barely  fit  through 
the  narrow  tunnels. 


Fig.  4  In  both  experiments,  Cave  Crawler 
started  at  one  end  of  the  figure  eight  (upper 
right)  and  then  proceeded  along  one  side  and 
around  the  bottom.  The  colored  pointclouds 
correspond  to  the  three  segments. 


guesses,  and  40  particle  submaps  for  each  segment.  In  this  experiment,  we  evalu¬ 
ated  guesses  as  described  above,  and  checked  the  collision  sensor  model  in  10  map 
samples,  considering  the  maximum  variance  criteria  to  be  satisfied  if  the  binary 
outcome  of  the  collision  model  indicated  traversability  in  4-7  samples  (Figure  5). 
Most  guesses  result  in  collisions  and  rarely  have  success  rates  above  2/10.  But  some 
guesses  successfully  bridged  the  middle  segment  of  the  figure  eight,  and  the  success 
rate  of  actions  across  these  bridges  depended  on  the  SegSLAM  uncertainty. 

To  control  the  SLAM  uncertainty,  we  artificially  increased  the  motion  model 
noise.  When  the  SegSLAM  uncertainty  was  high  the  bridging  actions  had  success 
rates  of  4/10  or  6/10,  and  following  the  max  variance  criteria  (relative  to  simply 
continuing  straight)  Cave  Crawler  took  the  action  and  closed  the  loop  (Figure  6: top). 
In  the  runs  with  low  SegSLAM  uncertainty,  the  bridging  actions  had  either  high  or 
low  success  rates,  and  Cave  Crawler  did  not  cross  the  center  tunnel,  continuing  to 
follow  the  usual  exploration  actions  up  and  around  the  figure  eight  to  close  the  loop 
(Figure  6:bottom). 

Using  our  method,  Cave  Crawler  was  able  to  actively  predict  and  close  a  loop 
in  real-time  at  full  safe  vehicle  velocity,  about  0.2  m  per  second  in  this  constrained 
environment. 


6  Conclusions 


Our  approach  to  Active  SLAM  can  be  summarized  as  using  simplified  models  to 
heuristically  estimate  the  entropy  of  the  SegMap.  In  particular,  the  method  probes 
the  SegMap  by  analyzing  the  measurements  generated  by  a  simple  sensor  model 
applied  to  multiple  map  samples.  We  have  shown  that  this  approach  is  equivalent 
to  information-gain  estimation  when  the  planning  horizon  is  within  the  resampling 
period  of  the  particle  filter,  and  we  have  experimentally  found  that  produces  rea- 
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Fig.  5  Action  votes  over  the  period  of  time  when  Cave  Crawler  approached  the  crossing  tunnel, 
for  both  the  high  and  low  uncertainty  runs. 


Fig.  6  Top:  High  uncertainty  run:  the  guessed  path  through  the  center  tunnel  (left)  had  a  success 
rate  of  6/10,  which  satisfied  the  maximum  variance  criteria.  After  taking  this  action,  SegSLAM 
closed  the  inner  loop  (right).  Bottom:  Low  uncertainty  run:  the  guessed  path  through  the  center 
tunnel  (left)  had  a  low  variance  success  rate  of  9/10.  The  vehicle  continued  to  follow  the  outside 
of  the  figure  eight,  and  SegSLAM  closed  the  outer  loop  (right). 


sonable  plans  even  beyond  this  horizon.  We  also  showed  how  to  use  the  SegMap  as 
a  predictive  model  to  hypothesize  about  unseen  regions  by  using  nearby  segments 
as  priors.  Integrating  all  these  ideas,  we  demonstrated  active  SLAM  in  real-time 
onboard  Cave  Crawler. 
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In  future  work,  we  would  like  to  compare  our  method  with  other  information- 
gain  heuristics  over  a  range  of  structured  and  unstructured  3D  environments. 

We  would  particularly  like  to  thank  Chuck  Whittaker  for  his  help  getting  Cave 
Crawler  into  the  catacombs. 
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Outdoor  Downward-facing  Optical  Flow 
Odometry  with  Commodity  Sensors 


Michael  Dille,  Ben  Grocholsky,  and  Sanjiv  Singh 


Abstract  Positioning  is  a  key  task  in  most  field  robotics  applications  but  can  be 
very  challenging  in  GPS-denied  or  high-slip  environments.  A  common  tactic  in 
such  cases  is  to  position  visually,  and  we  present  a  visual  odometry  implementa¬ 
tion  with  the  unusual  reliance  on  optical  mouse  sensors  to  report  vehicle  velocity. 
Using  multiple  kilometers  of  data  from  a  lunar  rover  prototype,  we  demonstrate  that, 
in  conjunction  with  a  moderate-grade  inertial  measurement  unit,  such  a  sensor  can 
provide  an  integrated  pose  stream  that  is  at  times  more  accurate  than  that  achievable 
by  wheel  odometry  and  visibly  more  desirable  for  perception  purposes  than  that 
provided  by  a  high-end  GPS-INS  system.  A  discussion  of  the  sensor’s  limitations 
and  several  drift  mitigating  strategies  attempted  are  presented. 


1  Introduction 

Accurate  knowledge  of  position  is  critical  to  successful  completion  of  field  robotics 
tasks.  In  known  or  highly  structured  environments,  localization  relative  to  a  pre¬ 
determined  or  progressively -refined  map  is  typically  performed  using  sensors  ap¬ 
propriate  for  registering  map  features  to  observations.  In  general  outdoor  scenar¬ 
ios,  absolute  positioning  using  Global  Positioning  and  Inertial  Navigation  systems 
(GPS-INS)  is  frequently  performed,  often  in  conjunction  with  input  from  odometry 
integration,  and  may  augmented  further  by  continuous  registration  of  local  terrain 
or  obstacle  maps.  Indeed,  recent  GPS-INS  devices  advertise  errors  as  low  as  a  few 
centimeters  in  position  and  hundredths  of  a  degree  in  attitude  after  alignment  [13]. 

Many  applications  present  significant  challenges  for  these  positioning  strategies. 
GPS  may  be  unavailable  or  ineffective  if  too  few  satellites  are  visible  during  urban  or 
subterranean  operations.  Wheel-based  odometry  depends  on  an  accurate  kinematic 
model  and  can  degrade  greatly  in  the  presence  of  wheel  slip  typical  of  low-friction 
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surfaces  and  skid- steered  vehicles.  In  poorly  lit  or  very  low-texture  environments, 
systems  based  on  vision  will  see  reduced  performance. 

Planetary  rover  missions  pose  particularly  challenging  cases.  GPS  is  unavail¬ 
able  and  alternatives  such  as  star  tracking  do  not  offer  comparable  accuracy.  Loose 
surface  dust  easily  impedes  wheel  odometry.  Further,  visual  odometry  and  terrain 
mapping  methods  are  frustrated  by  poor  lighting,  few  visual  surface  features,  and 
tight  computational  constraints  limiting  their  implementation  complexity. 

The  specific  need  we  seek  to  fulfill  is  that  of  a  pose  estimation  system  for  Scarab, 
the  lunar  rover  prototype  developed  at  Carnegie  Mellon  University  pictured  in  Fig¬ 
ure  1  [4]  [20].  This  280kg  skid-steered  vehicle  is  designed  to  explore  permanently 
shadowed  lunar  polar  craters,  which  combines  the  greatest  weaknesses  of  most  posi¬ 
tioning  methods:  the  unavailability  of  GPS,  constant  near- total  darkness,  occlusion 
of  most  stars  by  the  high  crater  walls,  soft  lunar  regolith,  and  strictly  limited  com¬ 
puting  and  power  facilities.  Exclusive  reliance  on  wheel  odometry  was  first  planned 
but  then  abandoned  in  recognition  of  the  errors  that  would  be  induced  by  surface 
slip,  the  eventual  implementation  of  softer  wheels  that  would  thereby  be  of  vary¬ 
ing  radius,  and  the  passive  differenced  rocker  suspension,  which  while  excellent 
for  maintaining  stability  on  bumpy  terrain,  makes  estimating  heading  changes  from 
odometry  inadequate.  Certainly  strategies  can  be  applied  for  detecting  slip  some 
proportion  of  the  time — and  [16]  provides  several  methods  for  doing  just  this  on 
planetary  rovers — but  the  design  of  Scarab’s  suspension  frustrates  this  approach 
and  arguably  more  complicated  physical  modeling  might  best  be  replaced  by  an 
alternative  sensing  modality. 

Conventional  forward-facing  visual  odometry  as  used  on  the  Mars  Exploration 
Rovers  (MERs)  [11]  was  considered  but  deemed  impractical  because  lighting  the 
surrounding  area  would  require  more  power  than  the  rover  can  provide  and  induce 
complex  shadows.  Instead,  we  decided  upon  using  downward-facing  visual  odom¬ 
etry.  This  provides  tractable  lighting  requirements,  however  these  forward-facing 
techniques  cannot  be  directly  applied  because  the  situation  is  ill-posed  to  compute 
the  3-D  incremental  pose  differences  and  the  terrain  beneath  the  rover  is  likely  to  be 
too  homogeneous  for  point-based  feature  tracking  to  work  effectively.  Rather,  we 
chose  to  rely  on  optical  flow  to  provide  an  estimate  of  vehicle  speed,  which  could 
then  be  integrated  to  estimate  incremental  distance-traveled  as  part  of  a  broader 
odometry  framework.  Various  methods,  such  as  the  Horn  &  Schunck  algorithm  [7], 
have  long  existed  for  computing  so-called  dense  optical  flow  over  regions  between 
images.  Such  algorithms  have  been  used,  for  instance,  for  autonomous  heading  con¬ 
trol  for  obstacle  avoidance  for  fixed-wing  aircraft  [21],  estimating  distance  to  the 
ground  and  canyon  walls  from  unmanned  aerial  vehicles  [6],  and  autonomous  land¬ 
ing  for  helicopters  [17].  In  the  odometry  realm,  downward-facing  cameras  have 
been  successfully  used  for  positioning  in  pre-explored  environments  by  correlating 
the  visible  area  against  an  existing  database  [8],  however  such  methods  are  inappli¬ 
cable  to  planetary  rovers  observing  most  patches  for  the  first  time. 

Although  an  optical  flow  implementation  using  a  typical  camera  would  have  been 
straightforward,  an  intriguing  alternative  presented  itself  in  the  form  of  a  custom- 
built  optical  flow  sensor  from  AirRobot  GmbH  [1]  used  for  stabilization  on  its  quad- 
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rotor  helicopters.  This  device,  shown  in  Figure  2,  contains  four  commodity  optical 
mouse  sensors  each  attached  to  a  lens  of  a  different  focal  length  and  reports  2-D 
velocity  (in  m/s)  along  the  planar  surface  at  which  it  is  pointed  and  an  estimate 
of  distance  to  this  plane.  The  precise  details  of  its  operations  are  proprietary  to 
the  manufacturer,  but  given  that  a  scalar  “tracking  quality”  value  is  known  to  be 
returned  by  mouse  sensors,  we  conjecture  that  height  is  derived  by  interpolating 
across  the  quality  values  returned  by  each  of  the  four  sensors,  with  speed  similarly 
interpolated  across  the  four  focal-distance-compensated  speed  values.  The  sensor 
nominally  reports  velocity  with  a  resolution  of  0.3mm/s  within  a  range  of  ±10m/s 
and  height  up  to  2.5m  with  a  resolution  of  1cm. 


Fig.  1  The  Scarab  lunar  rover.  The  glow  beneath  Fig.  2  The  optical  sensor  used  on  Scarab,  in  a 
is  from  LED  lighting  for  the  optical  flow  sensor,  custom  ruggedized  enclosure. 


The  use  of  optical  mouse  sensors  for  measuring  ground  velocity  has  a  number  of 
benefits  leveraging  over  a  decade  of  commercial  refinement.  Key  among  these  are 
remarkably  robust  operation  on  a  wide  variety  of  surfaces,  significant  lighting  insen¬ 
sitivity,  and  extremely  low  cost  (several  US  dollars)  due  to  the  volume  in  which  they 
are  produced.  These  typically  contain  a  15  to  100  pixels-square  camera  sensor  and 
are  believed  to  implement  a  fast  hardware  version  of  Horn  &  Schunck  [14]  reporting 
sub-pixel  flow  rates  at  on  the  order  of  1kHz.  Designs  based  on  laser  interferometry 
are  becoming  prevalent,  however  they  are  not  as  adaptable  as  they  do  not  use  simple 
lenses.  Due  to  their  low  cost  and  simplicity,  there  exists  wide  interest  in  the  hobbyist 
robotics  community  in  these  sensors.  Several  examples  of  their  use  in  limited  indoor 
scenarios  exist  in  the  robotics  literature  (eg  [15],  [18],  and  [19]),  but  they  have  yet 
to  see  use  in  a  field  robot  application.  A  method  similar  to  that  described  here  using 
a  webcam  and  image  correlation  matching  was  presented  in  [12]. 

In  this  paper,  we  describe  our  odometry  method  using  this  sensor,  present  results 
from  extensive  field  testing,  and  draw  conclusions  on  the  effectiveness  of  commod¬ 
ity  optical  flow  sensors  for  pose  estimation. 

2  Odometry  Method 

We  first  assume  that  the  vehicle  can  only  instantaneously  move  along  its  current 
heading  vector,  or  equivalently  assume  the  non-holonomic  constraint  of  no  wheel 
side-slip.  We  believe  this  to  be  valid  because  except  for  rare  cases  such  as  sliding 
laterally  down  a  slope,  wheel  slip  should  occur  only  when  skidding  in  place  during 
turns  or  failing  to  grip  the  ground  during  forward  travel.  Position  is  integrated  based 


4 


Michael  Dille,  Ben  Grocholsky,  and  Sanjiv  Singh 


on  the  current  linear  speed  and  heading  vector  at  all  times.  Linear  speed  is  estimated 
from  optical  sensor  readings,  and  the  heading  vector  comes  from  the  attitude  deter¬ 
mined  by  integrating  IMU  angular  rate  values.  Uncertainty  in  the  integrated  pose  is 
propagated  by  modeling  uncertainty  for  the  optical  sensor  and  the  IMU.  The  next 
section  formalizes  this  procedure  followed  by  methods  for  improved  accuracy. 

2.1  Basic  Odometry  Model 


We  begin  by  defining  the  vehicle  position  x  and  orientation  6  in  a  chosen  world 

fro  YY\  P  * 

x,  =  [x,y,z]T,  e,  =  1 0x,0y,0z]T  =  q,  =  [qs,qx,qy,qz]T  (1) 

where  the  latter  equivalence  denotes  that  we  may  refer  to  the  orientation  as  the  Euler 
angles  0t  or  a  unit  quaternion  qt. 

At  each  time- step,  the  IMU  provides  angular  rates  and  linear  accelerations  in  the 
body  frame: 

COyj  (Oz\ ,  a t  —  [CLX ,  CLy ,  CLZ\ .  (2) 

For  a  given  state  xt  we  may  then  write  the  vehicle  unit  heading  vector  in  the 
world  frame.  The  non-holonomic  velocity  constraint  can  then  be  embedded  in  the 
definition  of  the  vehicle’s  linear  velocity  as 


V*  =  VbodyUt  (3) 

where  Vbody  is  the  scalar  linear  velocity,  which  is  assumed  to  be  directly  provided 
by  the  optical  sensor.  In  actuality,  it  reports  velocity  in  two  dimensions,  but  to  avoid 
having  to  very  precisely  calibrate  for  any  rotational  offset,  we  simply  take  the  norm 
of  the  velocity  vector  it  reports  to  be  the  speed  and  determine  the  sign  from  that  of 
the  velocity  reported  along  the  axis  most  nearly  aligned  with  the  body. 

At  time  t  +  1,  given  v?+i  and  the  previous  values  xt  and  Vt,  the  new  position  may 

be  estimated  as  1 

x*+i  =  xt  +  -Af(v,+i  +  vr),  (4) 

where  At  is  the  time  elapsed  between  t  and  t  +  1,  computed  via  trapezoidal  inte¬ 
gration  of  the  velocity  vector.  A  similar  procedure  for  orientation  is  used.  Namely, 

q,  =  \qt*[0,\{(O,  +  (Ot+l)]T 

(\t+\  =  +  Ati{t 

estimates  the  new  orientation,  where  *  denotes  quaternion  multiplication  [10]. 

Dead-reckoning  error  uncertainty  is  propagated  by  the  linearized  covariance  pre- 
diction  equation 1  Pfc+1  =  F,P*F[  +  G*Q*GjC,  (6) 

where 


1  0  — Vsin  xyAt 

cos2  y/Oy  At  +  ^V2  sin2  y/Oco  At3  cos  i/^sin  —  5  V2o^At3)  —^Vsiny/o %At2 

F  = 

0  1  Vcosl if  At 

0  0  1 

and  GQGr  = 

cos  V'sin  yt(OyAt  - \v2  o^At3)  sin2  y/OyAt  +  V2cos2yrol)At3  ^Vcosy/o^At2 

—  2  Vsin  Xj/OflAt2  2VcosXl/<7ci)At2 

Approximate  trends  in  state  uncertainty  growth  over  time  can  be  observed  by 
reducing  Equation  6  to  straight  line  constant  speed  motion  with  inital  covariance 
=  diag(Px  Py  P\jf).  For  x-axis-aligned  motion,  Equation  7  indicates  along-track 

1  For  brevity,  we  here  present  uncertainty  propagation  for  planar  location  and  heading.  The  equa¬ 
tions  for  the  full  6-D  model  as  we  have  implemented  are  a  straightforward  extension. 
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and  heading  error  increases  proportional  to  the  square-root  of  elapsed  time.  Cross¬ 
track  error  exhibits  faster  error  growth,  linear  in  time  due  to  initial  heading  uncer¬ 
tainty  plus  growth  at  three-halves  power  of  time  due  to  heading  rate  noise.  This 
highlights  the  importance  of  low  heading-rate  uncertainty  in  achieving  accurate 
dead-reckoning. 


o 

o 
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AtOy 

0 
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0  Vyk  +  V2At2P«,,kVAtV«f,k 

+ 

0 

\V2At^l 
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hd 
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0 
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Distance  and  heading  error  drift  rates  for  the  system  have  been  determined 
through  ground-truthed  experimental  trials.  Figure  3  shows  the  difference  in  changes 
observed  between  the  dead-reckoning  solution  and  ground-truth  for  varying  length¬ 
time  ensembles.  Twenty  thousand  samples  were  compared  over  a  two  hour  period 
at  thirty  second  ensemble  increments.  This  trial  indicates  drift  rates  of  1.2 m/y/hour 
in  distance  traveled  and  2.5° / Vhour  in  heading. 


Fig.  3  Estimated  drift  rate  for  distance  traveled  (left)  and  heading  ( right)  determined  by  comparing 
changes  over  increasing  time  ensembles  between  the  dead-reckoning  solution  and  ground-truth. 


2.2  Improved  Attitude  Estimation 


While  the  error  in  the  pose  estimate  provided  by  the  above  integration  procedure  will 
necessarily  grow  without  bound,  several  simple  measures  can  be  taken  to  greatly  re¬ 
duce  attitude  error.  Many  of  these  are  intuitive  and  even  commonplace  in  commer¬ 
cial  navigation  systems,  however  they  receive  rare  mention  in  the  robotics  literature 
and  are  too  often  ignored  in  implementation. 

A  first  strategy  critical  for  long-term  operation  is  to  remove  the  approximately 
15  degrees  per  hour  on  Earth  (or  0.56  on  the  moon  [2])  angular  velocity  the  gy¬ 
roscope  will  inherently  pick  up  due  to  the  rotation  of  the  earth,  which  is  done  by 
rotating  the  known  angular  velocity  of  the  planet  into  the  local  coordinate  system 
and  subtracting.  When  stationary,  this  measured  angular  velocity  may  even  be  used 
to  perform  gyrocompassing.  The  gyroscope  bias  may  be  computed  by  averaging  the 
measured  angular  velocity  vector  while  stationary  (after  subtracting  the  planetary 
rotation  rate).  With  some  IMUs  this  may  needed  as  often  as  every  ten  minutes  for 


6 


Michael  Dille,  Ben  Grocholsky,  and  Sanjiv  Singh 


even  short-term  performance  [5].  This  step  was  not  required  in  our  case  since  this 
value  typically  averaged  to  be  negligible. 

Another  common  technique  is  to  use  the  gravity  vector  (weaker  but  still  useful 
at  about  0.16g  on  the  moon  [3])  as  measured  by  the  IMU’s  accelerometers  when 
stationary  to  estimate  roll  and  pitch  by  taking  arctangents  of  the  accelerations  along 
the  axes.  We  made  use  of  this  to  reset  drift  in  roll  and  pitch  during  stationary  periods. 
Given  now  two  complementary  sources  of  roll  and  pitch-this  direct  computation 
most  accurate  when  the  attitude  is  slowly  changing  and  integration  of  angular  rates 
just  the  opposite-we  implemented  a  matched  complementary  Butterworth  filter  pair 
to  continuously  merge  these  two  streams,  with  a  cutoff  frequency  of  0.05Hz  found  to 
be  most  appropriate  given  the  very  slow  motion  of  the  rover.  This  provides  excellent 
results  with  errors  in  roll  and  pitch  of  typically  much  less  than  a  degree  at  all  times. 

Finally,  anticipating  missions  consisting  of  long  stationary  periods  (primarily  in¬ 
tended  for  battery  recharging)  followed  by  short  periods  of  motion,  we  additionally 
clamped  the  measured  angular  velocity  vector  to  zero  when  the  vehicle  is  known  to 
be  stationary  to  avoid  blatantly  unnecessary  noise  integration. 

2.3  Velocity  Scale  Calibration 

Ideally,  the  optical  flow  sensor  returns  a  correctly  scaled  lateral  speed  regardless 
of  the  distance  between  the  sensor  and  the  ground.  We  model  the  actual  corrupted 
scaled  velocity  returned  as 

V measured  =  s{^)  '  V actual  +  b  +  V,  (8) 

where  s(h)  is  a  scaling  factor  that  varies  with  the  height  h ,  and  V  is  noise.  At  most  a 
trivial  bias  term  b  was  observed,  and  this  was  effectively  dealt  with  by  clamping  the 
measured  velocity  to  zero  when  the  vehicle  is  known  to  be  stationary. 

The  height  of  the  sensor  is  not  fixed  as  Scarab  is  equipped  with  an  active  sus¬ 
pension  used  to  lower  the  body  so  that  a  core-drilling  apparatus  may  operate.  Cali¬ 
bration  trials  at  varying  body  heights  were  performed.  Figure  9  shows  a  number  of 
such  runs  over  distances  varying  between  10  and  30  meters.  As  it  indicates,  rather 
different  scaling  values  were  found  for  daytime  and  nighttime  operation,  but  within 
each  class,  a  simple  quadratic  fit  provides  reasonable  compensation. 

The  sensor  also  exhibited  a  scale  dependence  on  the  surface  type  with  values 
tightly  clustered  for  a  given  surface.  An  online  auto-calibration  scheme  proved  quite 
effective  by  relying  on  the  observation  that  during  consistent  straight-line  motion, 
wheel  odometry  can  be  very  accurate.  Every  three  seconds,  a  procedure  runs  that 
tests  a  battery  of  heuristics  to  decide  whether  the  wheel  odometry  from  that  period 
is  likely  to  be  trustworthy.  This  includes  for  instance  verifying  that  each  wheel’s 
velocity  agrees  closely  with  the  average,  all  agree  on  direction,  the  reported  velocity 
is  above  a  noise  floor,  and  that  the  IMU  is  reporting  minimal  yaw  rates.  Primar¬ 
ily,  this  eliminates  periods  including  heavy  slip  or  turning  (implying  likely  wheel 
odometry  error).  Each  period  is  added  as  a  learned  data  point  over  which  a  variation 
of  a  recursive  locally- weighted  linear  least  squares  algorithm  is  run  to  compute  the 
scaling  factor  as  a  function  of  body  height.  Each  time  a  new  velocity  value  from  the 
optical  sensor  is  received,  the  best  estimate  of  the  scaling  factor  for  that  height  is 
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used.  If  insufficient  calibration  data  points  are  available  for  the  region  surrounding 
that  height  the  scaling  factor  determined  by  the  original  manual  calibration  trials  is 
used  as  a  fall-back. 

An  example  run  using  the  online  scale  calibration  procedure  is  shown  in  Figures 
10  and  11,  which  indicates  that  this  process  performs  well,  though  unsurprisingly 
not  as  well  as  a  post-hoc  batch  method  computing  the  scaling  function  from  all  the 
data  points  at  once  (shown  in  Figure  12).  However,  it  performed  well  enough  and 
eliminated  most  difficulties  associated  changing  surfaces.  All  data  presented  uses  it 
unless  otherwise  indicated. 

3  Experimental  Results 

During  the  course  of  system  development  and  verification,  we  collected  tens  of 
hours  of  data  from  field  testing  over  several  kilometers  of  traversal,  including  ex¬ 
tended  simulated  lunar  terrain  at  Moses  Lake,  Washington  and  Mauna  Kea,  Hawaii. 
Field  tests  were  conducted  both  during  daytime  and  in  total  darkness,  for  which 
high-intensity  LED  bars  were  mounted  to  the  vehicle’s  underbelly  to  illuminate  the 
area  seen  by  the  optical  sensor. 

Anecdotally,  the  optical  sensor  worked  remarkably  well  across  a  wide  variety  of 
surfaces,  including  dirt,  mud,  grass,  asphalt,  and  sand.  Tracking  was  poor  on  poured 
concrete  surfaces  and  nearly  useless  on  painted  concrete.  Most  of  the  extended  field 
testing  took  place  over  sand  to  best  emulate  lunar  terrain,  and  it  was  noted  that 
typical  auto-calibrated  scaling  factors  were  somewhat  higher  than  for  other  surfaces, 
likely  explainable  by  somewhat  poorer  tracking  on  this  relatively  featureless  terrain. 

As  the  Honeywell  HG1700  IMU  used  for  odometric  integration  is  part  of  a  No- 
vaAtel  SPAN  GPS-INS  system,  a  convenient  source  of  approximate  ground  truth 
was  also  available.  At  the  slow  speeds  the  rover  moves,  INS  position  errors  on  the 
order  of  tens  of  centimeters  are  large  relative  to  short-term  distances  traveled,  how¬ 
ever  over  long  distances,  these  relatively  static  uncertainties  are  small  compared  to 
accumulated  odometry  error. 

In  the  plots  below,  full  odometry  results  are  provided  as  well  as  the  result  of  using 
INS -provided  attitude  (the  output  of  the  INS  Kalman  filter  that  uses  GPS  data)  in 
place  of  integrated  attitude.  The  purpose  of  this  is  to  independently  demonstrate  the 
effectiveness  of  the  optical  sensor  (accumulating  just  distance-traveled  error  in  the 
odometry)  and  because  further  methods  of  improving  heading  accuracy  would  be  a 
priority  in  any  future  implementation. 

An  early  observation  was  that  even  when  the  odometry-derived  pose  drifted  sig¬ 
nificantly,  as  shown  in  Figure  8  having  atypically  high  gyro  drift,  the  smoothness 
of  the  integrated  solution  was  much  more  useful  to  the  on-board  perception  system 
than  that  from  the  INS.  Laser  scanners  are  used  to  build  local  terrain  and  obsta¬ 
cle  maps.  Pose  jumps  present  false  obstacles  without  resorting  to  complicated  reg¬ 
istration  algorithms  precluded  by  limited  on-board  computing.  While  a  map  built 
from  the  integrated  solution  will  not  be  globally  correct,  the  resulting  higher-fidelity 
short-term  maps  are  more  valuable  for  local  motion  planning. 
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As  odometry  error  is  inherently  path  dependent,  errors  accumulated  over  differ¬ 
ent  trials  varied,  however,  the  results  shown  in  Figure  6  are  representative.  Data 
from  this  run,  a  nearly  700m  nighttime  traverse  at  Moses  Lake,  is  also  used  as  the 
examples  for  the  described  attitude  improvement  and  scale  calibration  methods.  At¬ 
titude  comparisons  between  the  INS-reported  values  and  integrated  results  are  given 
in  Figure  5.  Traces  of  easting,  northing,  and  heading  uncertainty,  propagated  using 
the  model  described  and  the  estimated  covariances  on  angular  rates  and  optical  flow 
velocity,  are  provided  in  Figures  13-15.  Though  this  is  just  one  example,  the  errors 
lie  within  the  uncertainties,  lending  credence  to  our  propagation  model  and  process 
covariances. 

4  Conclusions  and  Future  Work 


In  this  paper,  we  presented  an  implementation  of  vehicle  odometry  for  a  lunar  rover 
prototype  using  an  optical  mouse  sensor  to  provide  vehicle  velocity.  Results  show 
that  with  a  moderate-grade  IMU,  errors  can  be  small  over  long  distances.  Clearly  the 
weakest  point  in  such  a  design  is  the  accumulation  of  heading  error,  the  reduction 
of  which  would  be  a  key  focus  of  future  implementation.  The  lunar  application 
may  present  some  such  reduction  opportunities,  perhaps  via  the  stop-start  nature  of 
motion  during  missions  or  observation  from  an  orbiter.  Other  avenues  of  exploration 
include  using  multiple  sensors  to  track  heading  or  relax  the  kinematic  assumptions 
[9]  and  designing  a  new  sensor  with  focal  lengths  tuned  for  the  mounting  height. 


Yaw  vs.  time 


Yaw  error  vs.  time 


Fig.  4  Comparison  of  INS-reported  and  inte-  Fig.  5  Error  in  integrated  yaw  angle  relative  to 
grated  yaw  angle  after  application  of  described  INS-reported  value  after  application  of  describe 
attitude  integration  imnrovements.  attitude  integration  imDrovements. 


Overhead  position  trace 


Altitude  vs.  time 


Fig.  6  Overhead  comparison  of  odometry  results  Fig.  7  Altitude  comparison  of  odometry  results 
against  INS-reported  position  using  both  inte-  against  INS-reported  position  using  both  inte¬ 
grated  angular  rates  and  INS-reported  attitude,  grated  angular  rates  and  INS-reported  attitude. 
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Initial  manual  velocity  scale  calibration 


Fig.  8  Example  run  in  which  even  the  signifi-Fig.  9  Optical  velocity  scale  calibration  data 
cantly  drifting  integrated  estimate  is  preferable  to  over  a  number  of  short  straight-line  runs, 
a  very  jumpy  GPS  signal  for  perception  purposes. 


Fig.  10  Overhead  comparison  of  optical  velocity  Fig.  11  Comparison  of  optical  velocity  scaling 
scaling  methods  with  wheel  odometry  and  INS  methods  with  wheel  odometry  and  INS  position, 
position,  zoomed  in  to  the  end  of  the  run. 


Predicted  vs.  actual  error  -  (easting) 


Fig.  12  Points  used  for  batch  velocity  scale  cal-  Fig.  13  Easting  position  error  and  prediction, 
ibration.  A  linear  fit — mostly  just  capturing  a 
scale  bias  in  this  operation  region — is  overlaid. 
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Predicted  vs.  actual  error  -  (northing) 


Predicted  vs.  actual  error  -  (heading) 


Fig.  14  Northing  position  error  and  prediction.  Fig.  15  Heading  error  and  prediction. 
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Place  Recognition  using  Regional  Point 
Descriptors  for  3D  Mapping 

Michael  Bosse  and  Robert  Zlot 


Abstract  In  order  to  operate  in  unstructured  outdoor  environments,  globally  consis¬ 
tent  3D  maps  are  often  required.  In  the  absence  of  a  absolute  position  sensor  such 
as  GPS  or  modifications  to  the  environment,  the  ability  to  recognize  previously  ob¬ 
served  locations  is  necessary  to  identify  loop  closures.  Regional  point  or  keypoint 
descriptors  are  a  way  to  encode  the  structure  within  a  small  local  region  as  a  fixed¬ 
sized  vector,  though  individually  do  not  include  enough  context  to  fully  identify 
a  previously  seen  place.  Multiple  queries  to  a  database  of  descriptor  vectors  can 
quickly  identify  similar  features,  and  places  can  be  recognized  from  a  consistent  set 
of  descriptor  matches.  We  investigate  the  problem  of  designing  informative  keypoint 
descriptors  for  3D  laser  maps.  Several  models  are  considered  and  evaluated,  with  a 
particular  focus  on  the  optimal  descriptor  scale  and  keypoint  sampling  density.  The 
approach  is  evaluated  on  3D  laser  point  cloud  data  collected  from  a  vehicle  driving 
in  unstructured  off-road  environments.  Consistent  3D  maps  constructed  from  this 
data  without  assistance  from  any  other  sensor  (such  as  wheel  encoders,  GPS,  or 
IMU)  demonstrate  the  effectiveness  of  our  approach. 


1  Introduction 

Building  globally  consistent  3D  maps  is  an  essential  requirement  for  many  au¬ 
tonomous  systems  operating  in  unstructured,  off-road  environments,  including  earth- 
moving,  mining,  construction,  and  agriculture.  In  previous  work  [3],  we  introduce 
an  approach  for  incrementally  estimating  the  6  DoF  trajectory  of  a  vehicle  using  3D 
measurements  taken  from  a  continuously  spinning  2D  laser  mounted  on  a  vehicle 
(Figure  1).  The  resulting  trajectory  can  be  used  to  construct  3D  maps  of  the  environ¬ 
ment;  however  these  maps  are  only  locally  accurate.  In  order  to  correct  global  errors, 
constraints  can  be  introduced  into  the  map  structure  if  place  recognition  events  can 
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(a)  Bobcat  SI 85  (b)  Toyota  Prado  4WD 

Fig.  1:  The  spinning  laser  is  mounted  on  a  variety  of  ground  vehicles  for  data  collection. 


be  reliably  detected.  In  this  paper,  we  investigate  the  place  recognition  problem  for 
producing  globally  consistent  3D  laser  point  cloud  maps  given  locally  consistent  3D 
maps  as  input. 

For  our  purposes,  the  place  recognition  problem  can  be  stated  as  follows:  Given 
a  3D  point  cloud  map  of  a  small  local  region  or  place ,  determine  whether  or  not 
this  place  has  been  previously  observed,  and  if  so,  find  the  relative  coordinate  trans¬ 
formation  aligning  the  matching  places.  These  relative  alignments  can  be  used  as 
constraints  for  building  a  global  map.  Place  recognition  arises  in  several  applica¬ 
tions  including  loop  closure  detection  in  SLAM,  global  localization  in  a  known  map, 
and  fusing  several  maps  collected  over  time  or  simultaneously  on  multiple  vehicles. 

Although  point  cloud  maps  are  metric  by  nature,  the  place  recognition  process 
enforces  topological  constraints  which  correct  the  accumulation  of  registration  er¬ 
rors  over  time.  Typically,  point  cloud  maps  are  constructed  using  incremental  meth¬ 
ods  for  aligning  consecutive  observations  ( e.g .,  Iterated  Closest  Point  (ICP)),  which 
require  a  reasonable  prior  on  the  alignment  transformation.  Therefore,  the  ability  of 
a  place  recognition  algorithm  to  not  only  detect  a  match,  but  to  estimate  a  rough 
alignment,  is  also  critical. 

Several  general  approaches  have  been  explored  for  3D  place  recognition,  mainly 
in  the  context  of  loop  detection  for  SLAM.  One  type  of  solution  performs  pairwise 
comparisons  between  the  current  place  and  all  places  within  a  region  of  uncertainty 
around  the  estimated  robot  pose  [2,  5,  9].  However,  as  noted  by  Newman  et  al.  [8], 
this  approach  can  fail  in  some  situations  where  the  uncertainty  region  does  not  con¬ 
tain  the  matching  place.  In  addition,  at  large  scales,  uncertainty  can  become  so  large 
that  the  exhaustive  pairwise  search  cannot  be  performed  in  a  reasonable  time  [12]. 
Silver  et  al.  [11]  describe  a  system  for  topologically  mapping  underground  mines, 
where  the  tunnel  structure  allows  the  search  for  loop  closures  to  be  limited  to  in¬ 
tersections,  which  can  be  detected  using  2D  laser  scans.  Ryde  and  Hu  [10]  use  a 
coarse-to-fine  occupancy  grid  matching  algorithm  which  initially  samples  the  envi¬ 
ronment  uniformly  for  match  candidates,  then  further  considers  only  the  top  matches 
when  refining  at  a  higher  resolution.  Our  approach  uses  a  global  database  of  regional 
descriptors  or  key  points  extracted  from  all  previously  observed  places.  Place  recog¬ 
nition  is  accomplished  by  finding  those  places  which  have  a  significant  number  of 
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keypoints  in  common  with  the  query  place.  As  the  database  can  be  searched  in 
sublinear  time  with  respect  to  the  number  of  keypoints,  this  approach  can  be  imple¬ 
mented  efficiently  and  independent  of  current  pose  uncertainty  or  loop  scale. 

The  use  of  regional  shape  descriptors  is  also  a  common  approach  for  object 
recognition  ( e.g .,  [6,  7]).  However,  for  the  object  recognition  problem,  descriptors 
from  a  scene  are  typically  matched  against  a  set  of  candidate  objects  in  a  predefined 
library  to  detect  the  existence  of  a  particular  type  and  variant  of  an  object.  In  addi¬ 
tion,  when  attempting  to  recognize  places,  it  is  not  always  obvious  how  to  partition 
the  environment,  so  boundary  effects  can  occur  if  the  segments  are  not  in  precisely 
in  the  same  location.  While  in  object  recognition,  occlusion  is  an  important  con¬ 
sideration,  in  place  recognition  changes  in  the  environment — such  as  parked  cars 
changing  location,  changing  vegetation,  movable  furniture,  doors,  people,  etc. — are 
also  important  factors. 

This  paper  contributes  an  evaluative  study  of  regional  shape  descriptor  models 
for  outdoor  place  recognition.  In  addition  to  a  comparison  of  several  models,  we 
also  investigate  the  optimal  descriptor  scale  and  keypoint  sampling  density,  which 
are  often  arbitrarily  or  unsystematically  chosen.  We  extend  a  descriptor  model  first 
described  for  use  in  two  dimensional  maps  [1]  for  use  in  three  dimensional  maps. 
This  extension  is  not  trivial,  as  we  must  be  able  to  consistently  identify  three  de¬ 
grees  of  rotational  freedom  in  the  descriptor  coordinate  frame,  as  opposed  to  one 
rotational  DoF  in  2D  maps.  Existing  approaches  tend  to  fix  two  of  the  three  dimen¬ 
sions,  and  make  the  descriptors  rotationally  invariant  to  the  third,  at  a  performance 
cost  [6,  7].  The  regional  keypoint  descriptors  are  used  in  a  nearest  neighbor  vot¬ 
ing  scheme  to  identify  and  recognize  previously  observed  places.  The  ability  of 
our  approach  to  autonomously  generate  registered  outdoor  maps  is  demonstrated  in 
challenging  unstructured  environments. 

The  remainder  of  the  paper  is  organized  as  follows.  In  Section  2  we  describe  our 
framework  and  compare  several  keypoint  selection  strategies  and  descriptor  models. 
We  present  mapping  results  generated  continuously  from  a  spinning  laser  range  sen¬ 
sor  in  both  industrial  and  off-road  environments  in  Section  3.  Finally,  in  Section  4, 
we  present  our  conclusions  and  future  work. 


2  Approach 

In  this  section,  we  compare  several  keypoint  selection  heuristics  and  descriptor  mod¬ 
els  in  terms  of  their  suitability  and  effectiveness  in  our  place  recognition  framework. 
We  partition  our  maps  by  considering  the  data  from  each  five-second  segment  of 
vehicle  trajectory  to  be  a  “place”.  For  each  place,  we  extract  a  set  of  keypoints, 
compute  their  descriptors,  and  then  query  a  global  keypoint  database  for  each  new 
keypoint’s  ten  nearest  neighbors1.  Each  returned  neighbor  votes  for  the  local  map 
that  generated  it  in  a  positional  voting  scheme.  The  local  maps  with  the  highest 

1  We  use  the  ANN  kd- tree  implementation  to  find  approximate  (e  =  1)  nearest  neighbors: 

http : //www. cs . umd. edu/~mount /ANN 
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vote  scores,  if  over  a  threshold,  are  considered  as  candidate  place  matches.  The 
matches  can  be  subsequently  verified  using  a  geometric  consistency  check  among 
the  matched  keypoints,  followed  by  a  global  graph  optimization  step. 

Raw  descriptor  vectors  are  initially  processed  in  order  to  embed  them  in  a  Eu¬ 
clidean  space  and  reduce  the  number  of  dimensions.  The  embedding  occurs  in  two 
steps,  and  requires  training  data  containing  pairs  of  matched  keypoints  from  differ¬ 
ent  passes  through  the  environment.  First,  each  individual  dimension  is  transformed 
by  applying  the  empirical  cumulative  distribution  function  to  the  values,  resulting  in 
a  ID  uniform  distribution  for  that  dimension.  Second,  the  resulting  descriptor  is  lin¬ 
early  transformed  to  maximize  the  separability  between  matching  and  non-matching 
pairs.  Dimension  reduction  is  then  used  to  prevent  overfitting  to  noise  in  addition 
to  improving  the  computation  efficiency  of  nearest  neighbor  searches.  A  detailed 
description  of  this  descriptor  processing  technique  is  available  in  a  previous  publi¬ 
cation  [1]. 

The  first  step  in  representing  a  place  by  a  set  of  keypoints  is  to  select  the  locations 
about  which  to  build  descriptors.  To  ensure  a  high  place  recognition  rate,  keypoints 
should  be  selected  such  that  the  likelihood  of  selecting  keypoints  in  a  similar  loca¬ 
tions  on  a  subsequent  pass  is  high.  The  simplest  selectors  either  sample  randomly 
from  the  point  cloud  or  distribute  the  keypoints  in  space.  With  an  intelligent  selec¬ 
tion  scheme  there  is  the  potential  to  use  much  fewer  points  as  compared  to  more 
simplistic  sampling  approaches,  while  still  ensuring  they  are  stable  and  salient.  Ad¬ 
ditionally,  using  too  many  points  will  unnecessarily  increase  the  computation  and 
memory  requirements  of  the  system.  However,  we  found  little  advantage  in  any  of 
the  intelligent  selection  heuristics  we  experimented  with,  and  intend  to  further  study 
this  problem  in  future  work. 

Our  selection  heuristic  therefore  simply  subsamples  the  original  point  cloud  such 
that  no  two  points  are  within  a  minimum  distance,  effectively  limiting  the  maximum 
density  to  a  predetermined  value.  For  computing  the  orientation  at  a  keypoint,  we 
first  determine  the  first-  and  second-order  moments  of  the  point  cloud  within  a  fixed 
distance  of  the  keypoint.  By  taking  the  eigenvectors  of  the  covariance  matrix,  we 
are  able  to  determine  local  coordinate  axes,  up  to  a  sign  ambiguity.  To  resolve  this 
ambiguity,  we  ensure  that  the  z-axis  has  a  positive  component  in  the  local  “up” 
direction,  and  the  x-axis  has  a  positive  dot  product  with  the  direction  to  the  center 
of  mass. 


2.1  Keypoint  Description 

Typically,  3D  keypoint  descriptors  divide  the  space  around  the  keypoint  into  a  set 
of  bins,  then  compute  statistics  on  the  points  that  fall  into  each  bin  to  produce  a 
descriptor  vector.  We  consider  several  descriptor  types,  some  of  which  are  taken 
from  the  existing  literature  while  others  are  of  novel  design.  We  consider  three  key 
properties  as  design  parameters  for  any  descriptor: 
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Shape  The  main  defining  characteristic  of  a  keypoint  descriptor  is  the  arrange¬ 
ment  and  number  of  the  bins  around  the  keypoint.  There  is  a  trade-off  between 
the  number  of  bins  and  the  potential  descriptiveness  of  the  feature.  When  there 
are  too  many  bins,  many  of  the  bins  will  be  undersampled,  thereby  introducing 
non-Gaussian  quantization  errors  to  the  descriptors,  which  are  difficult  to  model. 
Scale  There  is  a  trade-off  in  determining  the  size  of  the  local  support  region 
around  a  keypoint  from  which  to  build  the  descriptor.  Larger  scales  have  more 
context  from  which  to  make  a  salient  descriptor,  while  smaller  scales  are  not 
as  sensitive  to  boundary  effects,  missing  data,  and  transient  objects.  However, 
computing  larger- scale  keypoints  require  significantly  more  computation. 
Statistics  The  statistics  gathered  for  each  bin  are  also  important  in  defining  the 
information  contained  in  the  descriptor  vector.  Many  existing  descriptor  mod¬ 
els  [6,  7]  include  only  a  count  of  the  number  of  points  in  each  bin;  however, 
higher-order  moments  [12]  or  even  complete  orientation  histograms  [4]  are  also 
possible.  Care  must  be  taken  in  the  design  of  the  number  and  shape  of  the  bins  to 
ensure  that  enough  points  fall  into  each  bin  in  order  for  the  higher  order  statistics 
to  be  reliable. 

For  the  experiments  presented  in  this  paper,  we  evaluate  the  effect  of  varying 
the  keypoint  scale  for  three  main  descriptor  shape  types:  spin  images,  shape  con¬ 
texts,  and  moment  grids.  As  the  descriptors  have  multiple  parameters  related  to  their 
shape  ( e.g .,  number  and  size  of  bins),  determining  the  optimal  shapes  is  a  difficult 
high-dimensional  optimization  problem.  The  parameters  used  for  each  descriptor 
are  based  upon  a  non-exhaustive  empirical  study  in  which  a  sampling  of  reasonable 
configurations  was  explored. 


2.1.1  Base  Dimensions 

To  each  descriptor  model,  we  include  seven  rotationally  invariant  shape-related 
statistics  derived  from  the  eigenvalues  of  the  covariance  matrix  of  the  local  point 
cloud.  These  include  the  three  eigenvalues  (where  Ai  >  A2  >  A3),  the  planarity  of 
the  region  p  =  2 (A2  —  A3) the  cylindrical-ness  of  the  region  c  =  (Ai  —  fa) /s,  as 
well  as  p  +  c,  and  p  —  c,  where  s  =  Ai  +  A2  +  A3 .  When  used  on  its  own  as  a  descrip¬ 
tor,  the  seven  base  dimensions  are  reduced  to  three  dimensions  by  the  preprocessing 
algorithm. 


2.1.2  Spin  and  Shell  Images 

Spin  images  [7]  are  cylindrical  descriptors  divided  into  bins  both  radially  and  along 
the  cylinder  axis.  The  resulting  support  region  therefore  consists  of  a  set  of  3D  an¬ 
nular  bins.  The  descriptor  vector  consists  of  a  count  of  the  number  of  points  in  each 
of  these  bins.  For  the  implementation  used  in  this  paper,  we  choose  10  radial  and  10 
height  divisions  yielding  (with  the  seven  base  dimensions)  a  107-dimensional  de- 
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scriptor  vectors  that  typically  (depending  on  the  scale)  reduce  down  to  9-dimensions 
after  preprocessing. 

Shell  images  are  a  generalization  of  spin  images  that  are  rotationally  invari¬ 
ant  as  the  bins  are  defined  as  spherical  shells  about  the  keypoint.  We  choose  20 
quadratically  spaced  radial  divisions  which  after  preprocessing  typically  results  in  a 
5 -dimensional  feature  vector. 


2.1.3  3D  Shape  Contexts 

The  support  region  of  3D  shape  contexts  [6]  are  spheres,  with  the  bins  arranged 
by  dividing  space  quadratically  along  the  radius,  and  linearly  in  the  azimuth  and 
elevation  directions.  As  with  spin  images,  the  descriptor  vector  simply  maintains 
a  count  of  the  number  of  points  in  each  bin.  In  contrast  to  the  original  authors’ 
approach,  we  do  not  normalize  either  the  shape  context  or  spin  image  by  volume  or 
density,  as  these  effects  are  accounted  for  in  our  descriptor  normalization  procedure. 
Since  the  average  point  density  is  much  lower  in  our  data  set  as  compared  to  the 
original  authors’,  we  choose  5  radius,  8  azimuth,  and  4  elevation  divisions.  The 
resulting  167-dimensional  descriptors  is  typically  reduced  to  18  dimensions  after 
preprocessing.  We  also  do  not  need  to  repeat  the  descriptor  for  every  rotation  of  the 
azimuth  bins  since  we  can  reliably  compute  all  3  rotational  degrees  of  freedom  of 
the  keypoint  frame. 


2.1.4  Moment  Grids 

3D  moment  grids  use  a  rectilinear  voxelization  of  the  space  around  the  keypoint. 
They  are  extended  from  a  2D  descriptor  previously  used  for  two-dimensional  place 
recognition  [12].  For  each  cube-shaped  bin,  we  compute  moments  up  to  second- 
order,  for  a  total  of  ten  per  voxel  to  include  in  the  descriptor  vector.  We  use  2x2x2 
and  3x3x3  grids,  as  well  as  a  2  x  2  x  2  cylindrical  version  (radius,  azimuth, 
height).  Including  the  base  dimensions,  the  sizes  of  these  descriptors  are  87,  277, 
and  87  dimensions  respectively,  but  are  typically  reduced  by  the  processing  algo¬ 
rithm  to  16,  20,  and  12  dimensions. 


3  Results 

In  order  to  assess  ideal  keypoint  scales  and  densities,  a  registered  map  is  required 
for  training  and  testing  purposes.  An  earlier  untuned  version  of  this  approach  was 
used  with  a  separate  training  dataset  to  create  a  globally  consistent  point  cloud  and 
trajectory  from  which  matching  keypoint  pairs  could  be  extracted.  The  training  data 
was  collected  in  both  off-road  and  industrial  environments.  To  generate  the  match 
set,  the  dataset  is  evenly  sampled  at  0.5  m  resolution  and  matching  keypoint  pairs 
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are  identified  as  those  that  are  within  0.5  m  and  15  degrees,  and  observed  more  than 
30  seconds  apart.  A  set  of  unmatched  pairs  is  generated  by  selecting  keypoints  at 
random.  The  retrieval  rates  are  measured  using  half  of  the  matched  and  unmatched 
pairs  as  training,  with  the  remainder  as  a  validation  set  in  a  repeated  random  sub¬ 
sampling  cross-validation  scheme. 


3.1  Key  point  Scale 


Our  first  aim  is  to  determine  an  appropriate  scale  for  the  region  encoded  by  each 
keypoint  descriptor.  For  each  descriptor  type,  we  generate  keypoints  at  a  variety  of 
scales  from  1  m  to  10  m  and  measure  the  retrieval  accuracy  over  the  known  matches. 
Note  that  the  keypoint  orientations  are  dependent  on  the  scale,  so  must  be  computed 
separately  for  each  experiment.  To  quantify  the  accuracy  at  each  descriptor  scale,  we 
measure  the  precision  and  recall  rates  to  compute  the  F-measure,  which  is  defined 
as  the  harmonic  mean  of  precision  and  recall. 

Figure  2a  plots  the  F-measure  versus  the  scale  for  each  descriptor  type  consid¬ 
ered.  As  expected,  we  observe  an  improvement  in  F-measure  as  the  scale  increases; 
however,  after  about  7  m  the  improvement  tapers  off  for  all  descriptor  types.  We 
interpret  this  result  as  demonstrating  a  general  improvement  as  more  points  are  con¬ 
tained  within  the  keypoint  region;  eventually,  however,  sensitivity  to  boundary  ef¬ 
fects  and  misregistrations  decrease  the  utility  of  the  descriptors.  We  also  note  that 
the  three  moment  grid  descriptors  perform  best,  though  the  number  of  bins  and  their 
arrangement  seem  to  be  less  important.  This  suggests  that  the  inclusion  of  higher- 
order  moments  adds  meaningful  information,  as  long  as  the  bins  contain  a  sufficient 
number  of  support  points  for  meaningful  statistics  to  be  computed.  The  3D  shape 
context  descriptor  performs  nearly  as  well  as  the  moment  grid  descriptors. 


3.2  Keypoint  Density 

Given  an  appropriate  descriptor  scale,  we  now  evaluate  the  overall  map  matching 
performance  resulting  from  the  nearest  neighbor  voting  procedure  described  in  Sec¬ 
tions  1  and  2.  For  the  purposes  of  the  voting,  we  aggregate  the  votes  over  five- second 
intervals,  with  the  associated  place  represented  by  the  trajectory  point  at  the  center 
of  that  interval.  The  choice  of  five  seconds  is  made  to  provide  sufficient  time  to 
observe  enough  context  around  the  vehicle  without  being  overly  influenced  by  drift 
accrued  in  the  local  scan-matching.  For  every  query  place,  we  retain  the  twenty 
places  with  the  highest  vote  scores.  For  each  potential  map  match  pair,  we  also  per¬ 
form  a  rigidity  test  on  the  keypoint  matches  and  reduce  the  vote  score  by  removing 
any  pairs  not  geometrically  consistent  with  the  largest  cluster  of  keypoint  transfor¬ 
mations  (RANSAC  could  also  have  been  used  for  this  step).  Any  scores  higher  than 
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(a)  Keypoint  scale  vs  keypoint  match  accuracy  (b)  Keypoint  density  vs  map  match  accuracy 

Fig.  2:  Results  from  experiments  examining  the  F-measure  at  various  keypoint  scales  and  densities. 
Each  datapoint  in  (a)  represents  an  average  over  30  trials,  with  3 o  error  bars.  Note  that  (a)  measures 
accuracy  relative  to  keypoint  retrieval,  whereas  (b)  is  with  respect  to  overall  map  match  detections. 


a  threshold  are  then  taken  as  a  map  match.  Note  that  additional  consistency  checks 
(< e.g .,  ICP)  could  be  made  to  further  invalidate  false  positives. 

In  this  set  of  experiments,  the  map  matching  performance  is  evaluated  for  differ¬ 
ent  keypoint  densities  for  all  the  descriptor  types  and  a  fixed  scale  of  7  m.  The  map 
matching  F-measure  is  defined  similarly  to  the  keypoint  match  F-measure  using 
the  precision  and  recall  rates  resulting  from  thresholding  the  vote  scores.  Note  that 
when  determining  ground  truth  for  map  match  pairs  it  is  often  difficult  to  decide 
whether  there  is  enough  overlap  to  warrant  a  match;  therefore  we  evaluate  preci¬ 
sion  and  recall  based  on  the  examples  where  this  decision  is  clear.  The  true  positive 
map  matches  are  defined  as  places  that  are  temporally  distinct  (visited  more  than  30 
seconds  apart),  spatially  near  (the  trajectories  are  within  10m),  and  which  observe 
points  overlapping  with  (within  0.5  m  of)  another  place  in  the  globally  registered 
point  cloud.  True  negatives  are  defined  as  temporally  distinct,  spatially  distant  (tra¬ 
jectories  are  more  than  20  m  apart),  and  non-overlapping. 

Figure  2b  illustrates  the  map  match  F-measures  for  various  keypoint  densities. 
The  results  demonstrate  that  fairly  low  keypoint  densities  (0.1  to  0.3  points  per 
cubic  meter)  are  sufficient  to  achieve  F-measures  close  to  the  maximum,  as  the  F- 
measure  curve  essentially  flattens  after  this  density.  Again,  the  moment  grid  and  3D 
shape  context  descriptors  are  observed  to  perform  significantly  better  than  the  other 
alternatives  tested. 


3.3  Mapping  Results 

Two  maps  constructed  using  our  approach  are  illustrated  in  Figure  3.  In  both  maps, 
loop  closures  are  detected  using  the  described  place  recognition  framework  with 
3x3  moment  grid  descriptors  at  a  scale  of  7  m  and  density  of  0.3  per  cubic  meter. 
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(a)  Map  1 :  overhead  view 


(b)  Map  1:  trajectory  over  time  with  matches 


Fig.  3:  Two  maps  constructing  using  the  keypoint  voting  scheme,  (a)  Map  of  a  lightly  wooded 
off-road  environment  overlaid  on  a  Google  Maps  image,  (b)  The  trajectory  of  map  1  with  time 
indicated  on  the  vertical  axis.  The  links  between  places  correspond  to  matches  that  pass  the  vote 
score  threshold  and  geometric  consistency  checks.  Color  indicates  the  strength  of  the  score  after 
the  consistency  check  (blue  low,  red  high).  The  loop  optimization  step  re- weights  the  edges  and  is 
robust  to  the  incorrect  matches,  (c)  The  open-loop  and  closed-loop  (after  loop  closures)  trajectories 
from  a  second  map.  (d)  Oblique  view  of  the  second  map  with  color  indicating  terrain  height. 


A  globally  consistent  trajectory  is  computed  by  a  robust  minimization  of  the  loop 
closure  constraints  (details  of  which  are  beyond  the  scope  of  this  paper),  and  the  3D 
point  cloud  is  updated  from  the  corrected  trajectory.  While  we  do  not  have  ground 
truth  for  these  maps,  the  accuracy  can  be  verified  qualitatively  by  comparison  with 
satellite  imagery  (as  in  Figure  3a). 
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4  Conclusions 


A  systematic  investigation  into  the  design  of  regional  point  descriptors  has  led  to 
a  model  and  associated  parameters  suitable  for  3D  place  recognition.  Though  sev¬ 
eral  of  the  models  considered  are  sufficiently  capable  of  finding  loop  closures  in 
our  data,  moment  grid  descriptors  perform  best  in  terms  of  accuracy  and  efficiency 
at  this  task.  While  the  calculation  of  higher-order  moments  for  these  descriptors 
requires  about  13%  more  time,  this  increase  in  computation  cost  is  not  highly  sig¬ 
nificant  as  keypoint  generation  can  easily  be  done  in  real-time.  A  limitation  of  our 
approach  is  that  the  descriptors  must  be  calibrated  using  a  registered  training  set 
and  it  can  be  challenging  to  obtain  a  sufficiently  large  number  of  true  positive  ex¬ 
amples.  However,  since  the  training  step  can  be  performed  on  a  separate  dataset,  the 
proposed  place  recognition  algorithm  can  be  run  online.  Future  work  will  focus  on 
developing  intelligent  keypoint  selection  heuristics  which  will  reduce  computation 
load  while  still  maintaining  a  high  probability  of  keypoint  detection. 
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Summary.  During  search  missions  in  disaster  environments,  an  important  task  for 
mobile  robots  is  map  building.  An  advantage  of  three-dimensional  (3-D)  mapping 
is  that  it  can  provide  depictions  of  disaster  environments  that  will  support  robotic 
teleoperations  used  in  locating  victims  and  aid  rescue  crews  in  strategizing.  How¬ 
ever,  the  3-D  scanning  of  an  environment  is  time-consuming  because  a  3-D  scanning 
procedure  itself  takes  a  time  and  scan  data  must  be  matched  at  several  locations. 
Therefore,  in  this  paper,  we  propose  a  scan-point  planning  algorithm  to  obtain  a 
large  scale  3-D  map,  and  we  apply  a  scan-matching  method  to  improve  the  accuracy 
of  the  map.  We  discuss  the  use  of  scan-point  planning  to  maintain  the  resolution  of 
sensor  data  and  to  minimize  occlusion  areas.  The  scan-matching  method  is  based  on 
a  combination  of  the  Iterative  Closest  Point  (ICP)  algorithm  and  the  Normal  Dis¬ 
tribution  Transform  (NDT)  algorithm.  We  performed  several  experiments  to  verify 
the  validity  of  our  approach. 

Key  words:  Search  and  Rescue,  Scan  points  planning 

1  Introduction 

Recently,  requests  for  the  development  of  robotic  systems  for  search-and- 
rescue  operations  have  been  increasing  rapidly.  After  the  Hanshin-Awaji 
earthquake  in  1995  (Japan)  and  the  World  Trade  Center  attack  in  2001 
(U.S.A.),  large  research  projects  for  search  and  rescue  were  kicked  off  in  Japan. 

One  of  the  important  tasks  for  mobile  robots  in  search-and-rescue  mis¬ 
sions  is  map  building.  Three-dimensional  (3-D)  mapping  is  used  to  provide 
representations  of  disaster  environments  that  will  support  robotic  teleopera¬ 
tions  used  in  locating  victims  and  aid  rescue  crews  in  strategizing.  To  realize 
3-D  mapping  in  disaster  environments,  we  constructed  the  3-D  scanner  de¬ 
vice  shown  in  Figure  l-(a).  It  consists  of  a  laser  range  finder  (SICK),  rotation 
stage,  and  CCD  camera  to  obtain  information  in  color.  It  takes  one  minute  to 
scan  an  environment  at  one  place,  and  the  device  generates  3-D  information  to 
create  a  remote  display  of  the  target  environment  (e.g.,  Figure  l-(b) ) ,  which 
helps  the  rescuers  get  a  complete  picture. 
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(a)  Developed  sensor  device  (b)  Acquisition  example 


Fig.  1.  Development  of  a  sensor  system  and  acquisition  example  of  an  environment 

Search-and-rescue  missions  require  rapid,  accurate  mapping.  However,  to 
map  a  large-scale  environment,  rescue  robots  must  obtain  3-D  information  in 
different  locations,  and  the  scan  data  obtained  at  different  locations  should  be 
merged  to  represent  a  large-scale  environment.  In  such  cases,  reasonable  scan- 
point  planning  is  very  important;  however,  measurements  in  different  locations 
generate  gaps  of  objects  in  the  map  caused  by  positioning  errors  at  each  scan 
point.  Therefore,  we  propose  a  scan-point  planning  algorithm  to  obtain  a  large 
scale  3-D  map,  and  we  have  applied  a  scan-matching  method  to  improve  map 
accuracy.  In  the  scan-point  planning  algorithm,  the  resolution  of  sensor  data 
and  occlusion  area  allow  as  much  of  a  target  area  as  possible  to  be  covered.  The 
scan-matching  method  is  based  on  the  Iterative  Closest  Point  (ICP)  algorithm 
[1]  and  the  Normal  Distribution  Transform  (NDT)  algorithm  [2].  In  this  paper, 
we  discuss  the  construction  of  3-D  maps  in  outdoor  environments,  and  we 
report  the  results  of  the  environment  mapping  of  our  campus  buildings. 


2  Related  Works 

Two  major  approaches  are  used  to  obtain  a  3-D  map.  One,  stereo  matching, 
involves  the  use  of  two  or  more  cameras  (e.g.,  [3]),  while  the  other  involves  a 
3-D  laser  range  scanner  (e.g.,  [4]).  The  merits  of  the  former  approach  are  fast 
measurement  and  simultaneous  acquisition  of  texture  information.  However, 
there  are  some  disadvantages:  (1)  stereo  matching  requires  brightness  and 
feature  information,  (2)  its  measurement  area  is  narrow,  and  (3)  distance 
accuracy  decreases  as  the  distance  from  the  objects  increases.  Therefore,  in 
this  study,  we  use  a  3-D  laser  range  scanner  with  a  CCD  camera  to  produce 
the  3-D  color  map  shown  in  Figure  1. 

In  regard  to  scan-point  planning,  a  classic  problem  is  the  Art  Gallery  Prob¬ 
lem  [5],  which  involves  the  number  of  guards  required  to  monitor  the  target 
floor  completely.  Basically,  the  problem  assumes  that  the  shape  of  the  floor 
is  known  and  the  target  environment  is  2-dimensional  (2-D).  Recently,  some 
novel  researches  for  view  point  planning  were  performed  in  3-D.  Sequeira  et 
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(a)  Point  cloud 


(b)  DEM 


(c)  MLS-map 


Fig.  2.  Comparison  of  DEM  and  MLS-map 


al.  proposed  a  view  planning  method  for  automatic  acquisition  of  environment 
information  with  consideration  of  sampling  density  [6] .  In  our  scan-point  plan¬ 
ning,  we  propose  a  scan-point  planning  method  that  combines  frontier-based 
navigation  [7]  and  the  Art  Gallery  Problem. 

Recently,  scan  matching  has  been  used  for  the  adjustment  of  the  relative 
position  of  the  scan  data  and  particularly  for  simultaneous  localization  and 
mapping  (SLAM)  in  mobile  robot  navigation [8].  In  this  research,  we  apply 
the  conventional  Iterative  Closest  Points  (ICP)  algorithm  [1]  and  the  Normal 
Distribution  Transform  (NDT)  algorithm  [2]  to  our  scan  data.  Because  both 
algorithm  have  advantages  and  disadvantages,  we  have  combined  them  to 
obtain  a  robust  and  accurate  3-D  map. 


3  Scan-Point  Planning 

To  obtain  a  large  scale  environment  by  repeated  3-D  scanning,  we  have  es¬ 
tablished  the  following  procedures: 

1.  Conducting  a  3-D  scan 

2.  Representing  the  scan  information  on  a  Multi-Level  Surface  map  (MLS- 
map) 

3.  Determining  the  movable  region 

4.  Planning  the  next  scan  point  in  the  region 

5.  Moving  the  3-D  scanner  to  the  designated  point 

6.  Repeating  steps  1  through  5  until  the  target  region  is  completely  covered 


3.1  Representation  of  scan  information 

To  perform  scan-point  planning  in  large-scale  environments,  point  cloud  rep¬ 
resentation  is  unsuitable  because  of  the  massive  volume  of  data  and  the  non¬ 
constant  density.  Therefore,  at  the  beginning  of  this  study,  we  applied  a  Digi¬ 
tal  Elevation  Map  (DEM)  to  represent  target  environments  [9]  for  scan-point 


4 


Nagatani  et  al. 


planning.  In  this  method,  each  scan  point  is  registered  into  one  cell  of  the  lat¬ 
tice  domain  on  a  2  dimensional  (2-D)  x-y  plane  as  height  information  from  a 
base  level.  The  DEM  advantageously  represents  a  non- flat  environment.  How¬ 
ever,  the  method  can  not  be  used  to  represent  spaces  under  objects  because 
only  the  highest  scan  point  is  effective.  An  example  of  point  cloud  represen¬ 
tation  is  shown  in  Figure  2- (a).  It  is  a  side  view  of  a  four-story  building  (left) 
and  a  tree  (right).  A  DEM  representation  of  Figure  2- (a)  is  shown  in  Fig¬ 
ure  2-(b)  from  the  same  viewpoint.  A  space  under  the  tree  is  not  visible  in 
the  DEM  representation,  which  is  undesirable  for  the  determination  of  the 
movable  region  of  mobile  robots. 

A  Multi-Level  Surface  map  (MLS-map) 

[10]  is  one  solution  to  the  representation  of 
such  an  environment.  In  this  method,  some 
edge-point  positions  of  objects  are  stored 
in  each  cell  of  the  lattice  domain  on  a  2- 
D  x-y  plane.  Figure  3  shows  a  concept  of 
the  MLS-map.  Information  about  two  ob¬ 
jects  is  stored  in  one  cell,  which  represents 
a  space  between  the  objects.  Based  on  the 
above  method,  the  point  cloud  representa¬ 
tion  in  Figure  2- (a)  is  represented  by  the 
MLS-map  in  Figure  2-(c).  The  space  under 
the  tree  is  now  visible. 

3.2  Determination  of  movable  region 

For  mobile  robots,  the  path  between  the  current  scan  point  and  the  next 
scan  point  must  be  connected.  Therefore,  we  define  a  movable  region  as  an 
area  (1)  which  is  connected  to  a  current  scan  point  and  (2)  whose  differential 
height  between  adjacent  cells  in  the  MLS-map  is  smaller  than  the  threshold. 
Of  course,  the  movable  region  depends  on  the  mobility  of  the  target  mobile 
robot.  In  our  implementation,  the  threshold  was  set  at  0.15  [m]. 

3.3  Region  segmentation 

To  maintain  the  resolution  of  scan  data,  we  divide  the  target  region  into 
three  regions,  (1)  scan-completed  region,  (2)  low-resolution  region,  and  (3) 
unscanned  region.  In  reality,  map  resolution  depends  on  not  only  the  scan 
range  but  also  the  orientation;  however,  to  simplify  the  planning  of  the  next 
scan  points,  we  use  the  following  definitions:  (1)  The  scan-completed  region 
(C)  is  a  set  of  cells  in  the  target  region  which  has  scan  data  and  whose  distance 
from  the  closest  scan  point  is  less  than  a  fixed  value.  In  our  implementation, 
the  distance  value  is  set  at  20  [m].  (2)  The  low-resolution  region  (M)  is  a  set 
of  cells  in  the  target  region  which  has  scan  data  but  does  not  belong  to  the 
scan-completed  region.  (3)  The  unscanned  region  ( U )  is  a  region  which  does 
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Fig.  3.  Introduction  to  MLS  map 
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not  belong  to  (1)  and  (2)  because  a  cell  in  the  region  is  too  far  from  the  scan 
points  or  is  occluded  by  objects. 

Figure  4  shows  an  example  of  the  re¬ 
gion  segmentation  of  an  initial  scan  from 
scan  point  A.  The  white  region  is  the 
scan-completed  region  (Ca)  from  point  A, 
the  gray  region  is  the  low-resolution  re¬ 
gion  (Ma),  and  the  blue  rectangles  are  ob¬ 
stacles.  The  dark-gray  region  is  the  un¬ 
scanned  region  ([/),  where  the  blue  rect¬ 
angles  occlude  or  are  far  from  point  A. 

3.4  Evaluation  function  for  next 
scan  point 

To  minimize  the  scan  procedure  while  maintaining  the  resolution  of  the  scan 
data,  the  unscanned  region  or  the  low-resolution  region  should  be  changed,  as 
much  as  possible,  into  a  scan-completed  region  by  the  next  scan.  Two  different 
indices  are  included,  therefore,  we  defined  an  evaluation  function  as  follows: 

Fx  =  area((U  fi  Cb)  U  (U  Pi  Mb))  •  + 

area(MA  H  Cb)  •  (1  —  a)  (1) 

where  area(-)  is  an  area  value  in  the  bracket,  C(B)  is  a  prospective  area  which 
becomes  a  scan-completed  region  when  the  next  scan  is  conducted  at  point  B, 
M(B)  is  a  prospective  area  which  becomes  a  low-resolution  region  when  the 
next  scan  is  conducted  at  point  B,  and  a  is  the  weight  value.  If  exploration 
in  the  unscanned  region  is  not  important,  the  a  should  be  very  small  or  zero. 

The  calculation  of  Equation  (1)  at  one  cell  in  the  movable  region  requires 
a  ray-tracing  scan  in  3-D  virtual  space,  which  is  time-consuming.  Therefore, 
we  applied  a  hill-climbing  search  from  several  randomized  initial  locations. 

Figure  5  shows  a  top  view  of  an  example  of  a  planned  result  in  an  MLS-map 
in  the  case  of  an  initial  scan.  In  this  figure,  the  blue  circle  depicts  an  initial  scan 
point,  the  red  circle  is  the  planned  scan  point  in  the  case  of  a  =  1.0,  the  green 
circle  is  the  planned  scan  point  in  the  case  of  alpha  =  0.0,  the  flesh-colored 
region  is  the  scan-completed  area,  the  yellow  region  is  a  low-resolution  movable 
area,  and  the  blue  region  is  a  low-resolution  but  not  movable  area.  Figure  6 
shows  virtual  views,  (a)  is  a  view  from  the  initial  scan  point  corresponding  to 
the  blue  circle  in  Figure  5,  (b)  is  a  view  from  the  planned  scan  point  in  the 
case  of  a  =  0.0  corresponding  to  the  red  circle  in  Figure  5,  and  (c)  is  a  view 
from  the  planned  scan  point  in  the  case  of  a  =  1.0  corresponding  to  the  green 
circle  in  Figure  5.  In  the  virtual  views,  scanning  in  unknown  area  has  priority 
in  the  case  of  a  m  1.0. 


Fig.  4.  Region  segmentation 
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Fig.  5.  Top  view  of  an  example  of  target  region 


(a)  A  view  from  the  initial  (b)  A  view  from  the  next  (c)  A  view  from  the  next 
scan  point  scan  point  (a  —  0.0)  scan  point  (a  —  1.0) 


Fig.  6.  An  example  of  a  scan  point 

3.5  Examples  of  planning  results  in  outdoor  environments 

We  tried  the  above  scan-point  planning  method  in  several  different  outdoor 
environments.  The  motion  of  the  3-D  scan  unit  to  the  next  scan  position  was 
performed  by  a  human  operator,  not  by  a  mobile  robot  platform.  Termination 
judgement  was  made  by  the  operator,  who  checked  the  coverage  region.  Due 
to  space  limitations,  only  one  experimental  result  will  be  discussed  here. 

The  target  environment  is  our  campus  (Mechanical  Department,  Aoba- 
Yama,  Tohoku  University,  Japan),  which  includes  buildings,  trees,  and  parking 
slots.  The  weighting  factor  of  a  is  set  at  0.5,  and  the  cell  size  of  the  MLS-map 
is  set  at  1.0  [m]  square.  Figure  7  shows  a  transition  of  the  scan  position  and 
scan  area  in  the  first  environment.  In  the  13  scan  and  movement  motions, 
the  detection  area  (the  map  itself)  was  expanded  step  by  step.  The  total 
detection  area  is  about  16,500  grids  (that  is  equal  to  square  meters).  The  result 
shows  that  our  proposed  algorithm  worked  well  for  the  large-scale  outdoor 
environment. 
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Fig.  7.  A  result  of  scan-point  planning  in  an  outdoor  environment  (our  campus) 

4  Scan  matching  for  improved  map  accuracy 

In  outdoor  environments,  global  positioning  systems  (GPS),  electromagnetic 
sensors,  and  inclination  sensors  can  be  used  to  obtain  a  scan  point.  However, 
because  of  sensing  errors  or  noise,  each  scan-point  location  may  include  a  po¬ 
sitioning  error.  To  compensate  for  such  errors,  scan  matching  is  used  to  adjust 
the  location  and  to  construct  an  accurate  map,  particularly  in  simultaneous 
localization  and  mapping  (SLAM).  For  scan  matching,  we  apply  the  Iterative 
Closest  Point  (ICP)  algorithm  [1]  and  the  Normal  Distribution  Transform 
(NDT)  algorithm  [2]. 

4.1  Summary  of  the  ICP  algorithm 

In  the  ICP  algorithm,  two  given  point  sets  are  registered  in  Cartesian  co¬ 
ordinates.  In  each  iteration  step,  the  algorithm  selects  the  closest  points  as 
correspondences  and  calculates  the  rotation  matrix  R  and  the  translation 
matrix  t  to  minimize  the  following  equation: 

Nm  Nd 

E(R,  t)  =  cdij  || mi  —  (Rdj  + 1)  ||  (2) 

i=l  j=  1 

where  Nm  and  Nd  are  the  number  of  points  in  the  reference  data  set  M  and 
the  matching  data  set  D  respectively,  ujij  =  1  when  rrii  is  the  closest  point 
to  dj,  and  Uij  =  0  otherwize. 

To  improve  the  accuracy  of  the  ICP  algorithm,  we  applied  the  ICP  algo¬ 
rithm  of  points  and  segments,  which  calculates,  not  a  distance  between  the 
closest  points,  but  a  distance  between  a  point  in  the  reference  data  set  and 
a  segment  between  two  closest  points  in  the  matching  data  set.  We  call  this 
algorithm  a  line-segment  ICP  algorithm. 
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4.2  Summary  of  the  NDT  algorithm 

In  the  NDT  algorithm,  a  target  space  is  divided  into  grids.  Then  the  distri¬ 
bution  of  scan  points  in  the  reference  data  set  in  one  grid  is  represented  by 
a  normal  distribution.  An  average  in  the  grid  i  is  represented  by  q i,  and  a 
covariance  matrix  in  the  grid  is  represented  by  27$.  Based  on  the  above  data, 
an  evaluation  function  is  defined  as  the  sum  of  the  matching  level  between 
a  point  x\  in  the  matching  data  set  and  the  normal  distribution  i,  which 
corresponds  to  point  as  follows: 


N- 1 


E(p)  =  2J  exp 


-K-QiY^K-Qi) 


(3) 


Detailed  explanations  and  equations  are  given  in  [2]. 

The  accuracy  of  the  result  depends  greatly  on  the  size  of  each  grid.  To 
improve  the  robustness  of  the  NDT  algorithm,  we  applied  an  algorithm  in 
which  the  size  of  each  grid  is  dynamically  changed.  At  first,  grid  size  is  large 
for  global  matching.  After  that,  the  matching  sequence  is  repeated  with  pro¬ 
gressively  smaller  grid  sizes.  In  our  implementation,  the  initial  size  of  the  grid 
is  20  [m],  the  second  size  is  15  [m],  and  the  final  size  is  10  [m].  We  call  this 
algorithm  the  Narrower  NDT  algorithm. 


4.3  Combination  of  the  ICP  and  NDT  algorithms 

To  construct  an  accurate  3-D  map  of  an  outdoor  environment,  we  applied 
the  above  scan-matching  algorithms  to  our  experimental  results,  as  shown 
in  section  3.5.  We  mounted  an  inclination  sensor  on  the  3-D  scanner,  so  the 
adjustment  parameters  in  this  scan  matching  are  the  position  (x,  y,  z)  and 
orientation  6  of  an  obtained  environment. 

Through  the  above  application  experience,  we  identified  the  following  com¬ 
parative  qualitative  features:  (1)  The  ICP  algorithm  is  more  accurate  than  the 
NDT  algorithm  when  the  matching  is  successful.  (2)  The  ICP  algorithm  be¬ 
comes  stuck  in  the  local  minima  much  more  easily  than  the  NDT  algorithm 
does.  Based  on  the  above  features,  to  pursue  both  accuracy  and  robustness 
for  scan  matching,  we  propose  a  combination  of  the  two  algorithms,  or  the 
NDT-ICP  Combination.  First,  the  Narrower  NDT  algorithm  is  applied,  and 
then  the  line-segment  ICP  algorithm  is  applied. 

Figure  8  shows  a  graph  comparing  the  above  three  matching  algorithms 
(left)  and  scan  locations  (right).  Table  1  shows  the  numerical  evaluation  values 
of  the  same  result.  A  “number- number”  represents  the  scan  location  indices 
for  matching;  for  example,  “2-3”  represents  a  match  between  scan  point  2  and 
scan  point  3.  Each  evaluation  value  is  the  ICP  score  calculated  by  Equation 
2.  The  smaller  of  the  evaluation  values  denotes  better  matching. 
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Fig.  8.  A  comparison  of  matching  algorithms  (our  campus) 


Based  on  the  comparison  re¬ 
sults,  the  NDT-ICP  Combination  al¬ 
gorithm  works  well.  In  the  case  of 
scan  matching  between  scan  points 
4  and  5,  the  solo  ICP  algorithm 
failed  to  match.  The  failed  example 
is  shown  in  detail  in  Figure  9.  White 
dots  are  scan  data  obtained  at  scan 
point  4,  and  green  dots  are  those  ob¬ 
tained  at  scan  point  5,  where  the 
scan  points  of  path  1  appeared.  The 
scan  data  of  path  2,  obtained  at  scan 
point  4,  disappeared.  However,  the 
scan  data  of  path  1  seemed  to  be 
pulled  by  the  scan  data  of  path  2.  Fig.  9.  Matching  result  “4-5”  by  solo 
Finally,  matching  failed,  as  shown  in  ^P  algorithm 
the  figure.  In  the  case  of  the  NDT- 

ICP  Combination  algorithm,  such  mismatching  did  not  happen.  In  the  case 
of  scan  matching  between  scan  points  3  and  4,  the  ICP  algorithm  was  slightly 
better  than  the  NDT-ICP  Combination  algorithm,  perhaps  because  the  con¬ 
vergence  direction  of  the  solo  ICP  algorithm  was  different  from  the  direction 
after  the  NDT  algorithm  was  applied  and  the  ICP  iteration  was  stopped  in 
the  different  situations. 


Table  1.  Scan  Matching  Comparison  :  Proposal 


1-2 

2-3 

3-4 

4-5 

5-6 

Line-segment  ICP 

301.0 

269.2 

169.2 

503.9 

378.4 

Narrower  NDT 

258.7 

292.1 

201.9 

245.6 

317.7 

Proposal  (NDT-ICP  Comb.) 

236.1 

262.7 

173.9 

226.5 

254.3 
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5  Conclusions  and  future  work 

In  this  paper,  we  have  proposed  a  scan-point  planning  algorithm  to  obtain  a 
large  scale  3-D  map  efficiently  and  a  combination  of  scan-matching  algorithms 
(NDT  and  ICP)  to  improve  mapping  accuracy.  Finally,  we  have  offered  an  ex¬ 
ample  of  mapping  in  an  outdoor  environment  to  confirm  the  validity  of  the 
above  approach.  We  have  also  applied  the  approach  to  two  different  envi¬ 
ronments,  a  small  natural  field  at  Mt.  Aosasa  in  Sendai  City  and  a  park  on 
our  campus  (without  large  buildings).  In  both  environments,  the  proposed 
approach  worked  well.  In  the  former  case,  the  target  environment  included 
small  trees,  and  the  solo  ICP  algorithm  became  stuck  in  the  local  minima  for 
scan  matching.  These  results  are  not  included  here  due  to  space  limitations. 

In  our  current  implementation  of  scan-point  planning,  we  have  not  con¬ 
sidered  the  moving  cost  of  mobile  robots.  Therefore,  the  next  scan  point  may 
be  far  from  the  current  scan  point,  as  happened  in  the  planning  procedure 
shown  in  this  paper.  In  the  future  works,  it  is  required  to  discuss  optimality 
of  the  viewpoint  selection  deeply.  Furthremore,  although  we  have  defined  scan 
resolution  simply  as  the  distance  to  an  object,  it  should  be  considered  in  the 
orientation  of  the  targets.  By  solving  the  above  problems,  we  aim  to  obtain  a 
feasible  and  accurate  3-D  map  in  an  outdoor  environment. 
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Image  and  Sparse  Laser  Fusion  for  Dense  Scene 
Reconstruction 
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Abstract  This  paper  is  concerned  with  reconstructing  the  metric  geometry  of  a  scene 
imaged  with  a  single  camera  and  a  scanning  laser.  Our  aim  is  to  assign  each  image 
pixel  with  a  range  value  using  both  image  appearance  and  sparse  laser  data.  We  pose 
the  problem  as  an  optimisation  of  a  cost  function  encapsulating  a  spatially  varying 
smoothness  cost  and  measurement  compatibility.  In  particular  we  introduce  a  second 
order  smoothness  term.  We  derive  cues  for  discontinuities  in  range  from  changes  in 
image  appearance  and  reflect  this  in  the  objective  function.  We  show  that  our  formu¬ 
lation  distills  down  to  solving  a  large  linear  system  which  can  be  solved  swiftly  using 
direct  methods.  Results  are  presented  and  analysed  using  synthetic  cases  to  demon¬ 
strate  salient  behaviours  and  on  real  data  to  highlight  real-world  applicability. 


1  Introduction  and  Motivation 


This  paper  is  about  dense  mapping  of  workspaces  using  common  place  cameras  and 
scanning  lasers.  Cameras  provide  near  instantaneous  capture  of  the  workspace’s  ap¬ 
pearance  (texture  and  colour)  but,  from  a  single  view,  little  geometrical  information. 
On  the  other  hand,  scanning  lasers  produce  comparatively  slow,  sparse  metric  sam¬ 
pling  and  beyond  reflectance,  capture  little  of  the  scene’s  appearance.  This  motivates 
us  to  consider  how  we  might  fuse  sparse  laser  data  and  images  to  infer  a  range  for 
every  pixel  in  the  image,  allowing  us  to  reconstruct  a  3D  scene  with  all  the  texture, 
colour  and  appearance  information  captured  in  the  original  image.  The  heart  of  the 
problem  is  how  to  sensibly  infer  ranges  for  pixels  which  are  not  near  any  laser  mea¬ 
surements  without  introducing  intolerable  distortions.  Our  method  is  general  in  that  it 
is  not  tied  to  any  particular  3D  laser  scanner  mechanism  or  geometry.  Note  also  that 
we  aim  to  recover  the  dense  geometry  of  a  scene  over  scales  which  prohibit  the  use  of 
other  direct  methods  such  as  stereo  unless  a  truly  large  baseline  is  used. 
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2  Related  Work 

The  problem  of  inferring  3D  surface  models  of  a  scene  using  laser  or  camera  sensors 
has  been  studied  extensively  over  many  years  (see,  for  example  [1,  2,  3,  4]).  How¬ 
ever,  limitations  in  hardware  and  a  requirement  for  speedy  data  gathering  in  mobile 
robotics  typically  results  either  in  high  resolution  optical  images  only  allowing  infer¬ 
ence  of  very  basic  3D  geometry,  or,  alternatively,  low  resolution  range  images  which 
often  sample  the  scene  too  sparsely  to  allow  for  faithful  reconstruction.  Multiple  view 
reconstruction  provides  an  attractive  alternative  due  to  a  near  instantaneous  gathering 
of  dense  3D  data  leading  to  dense  scene  reconstructions  from  image  data  alone  [5,  6]. 
Unfortunately,  stereo  reconstruction  fidelity  is  limited  in  range  by  the  baseline  and  the 
image  resolution.  This  seriously  impedes  accurate  reconstruction  beyond  a  few  meters 
from  the  camera.  Another  alternative  can  be  found  in  the  exploitation  of  the  comple¬ 
mentary  nature  of  vision  and  range  sensing.  While  optical  images  and  range  images 
represent  different  quantities,  they  share  “similar  second  order  statistics  and  scaling 
properties”  [7]. 

Only  a  relatively  small  body  of  work  exists  on  the  inference  of  surfaces  by  fusing 
laser  data  and  camera  images.  Usually,  these  techniques  exploit  the  fact  that  edges  in 
the  optical  image  often  correspond  to  discontinuities  in  depth,  and  that  smooth  sur¬ 
faces  tend  to  correspond  to  areas  of  similar  colour  and  texture.  In  [8],  depth  values  for 
pixels  in  an  image  are  inferred  using  belief  propagation  in  a  Markov  Random  Field 
(MRF)  framework.  The  technique  requires  that  the  supplied  range  measurements  con¬ 
tain  some  high  density  areas  from  which  to  seed  the  solution,  and  is  unable  to  assign 
depth  values  outside  of  those  already  in  the  measurements.  The  techniques  described 
in  [9],  [10]  and  [7]  are  able  to  fuse  the  information  from  both  sources  to  significantly 
improve  the  resolution  of  low  quality  range  images.  The  method  of  [9]  is  particularly 
relevant  to  this  work.  It  employs  an  MRF  formulation  with  a  first-order  smoothness 
prior.  The  technique  favours  fronto-parallel  surfaces,  but  does  not  suffer  too  greatly 
from  this  because  the  range  measurements  are  sufficiently  regular  and  dense,  com¬ 
ing  from  a  special  range  camera  sensor.  This  ‘pins’  the  estimates  to  lie  near  the  true 
surface. 

In  contrast  to  [9]  the  method  presented  here  is  targeted  at  any  combination  of  com¬ 
monly  available  monocular  camera  and  scanning  laser.  In  particular,  this  requires  in¬ 
ference  of  range  measurements  based  on  sparse,  inhomogenous  range  data.  In  such 
cases,  the  fronto-parrallel  tendency  of  inferred  surfaces  induced  by  only  considering 
a  first-order  smoothness  prior  leads  to  increasingly  inaccurate  reconstructions.  We  ad¬ 
dress  that  issue  by  introducing  a  second-order  smoothness  prior  while  still  framing  the 
problem  as  a  well-understood  optimization  of  a  linear  system  of  equations. 


3  Problem  Formulation 


In  this  section  we  shall  show  how  a  general  description  of  the  problem  can  be  for¬ 
mulated  in  such  a  way  that  in  the  end,  only  the  solution  of  a  single  linear  system  is 
required.  We  begin  by  introducing  our  notation. 
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We  are  given  a  u  by  v  pixel  image  and  a  3D  point  cloud  of  k  laser  measurements 
Jjf  =  {l\  •  •  -4}.  We  shall  use  the  notation  I;  to  represent  the  ith  pixel  in  a  vectorised 
image  (all  pixels  stacked  in  a  single  vector  of  length  N  =  uxv).  For  each  1/  we  asso¬ 
ciate  a  range  Xi.  Our  task  is  to  use  both  J?  and  j£?  to  find  a  vector  x  =  [x\ , X2  •  •  •  xn]t  - 
a  range  for  every  pixel  in  the  image.  We  shall  also  refer  to  v,-  as  a  “range  node”.  Each 
point  in  Jzf  can  be  projected  into  J?  under  a  distortion  correcting  camera  model  and 
associated  to  the  nearest  pixel.  Each  laser  point  then  yields  a  range  measurement  Zi 
tied  to  pixel  I/.  Note  the  laser  measurements  are  sparse  so  not  every  pixel  will  have  a 
range  measurement  —  in  fact  very  few  will.  We  use  the  notation  i  E  ££  to  imply  the 
index  variable  i  ranges  over  all  pixels  which  have  an  associated  range  measurement. 

We  shall  pose  the  problem  as  one  of  finding  the  optimal  range  vector  x*  such  that 


x*  =  argmin{AiA20j(x,I)  +  Ai(l  -  A2)0c(x,I)  +  (1  -  Xi)0d(x,z)}  (1) 


X 


where  0*(x,I)  is  a  first  order  cost  penalising  depth  discontinuities,  0c(x,I)  is  a  sec¬ 
ond  order  cost  penalising  curvature  and  0^(x,z)  is  a  data  cost  penalising  errors  be¬ 
tween  inferred  ranges  and  observed  range  measurements.  The  scalars  X\ .  X2  E  [0, 1] 
are  weightings  between  the  three  terms.  We  shall  now  consider  these  terms  in  more 
detail. 


3.0.1  Data  Cost 

The  data  cost  is  defined  as  a  squared  error  between  assigned  range,  jq  and  measured 
range,  n 


0rf(x,  z)  =  £  Gi{xi-Zi)2 
ieSf 


(2) 


=  ||W(x-z)||2 


(3) 


where  W  is  a  diagonal  matrix  with  entries 


(4) 


and  Gi  is  a  measure  of  our  confidence  in  measurement  Zi- 


3.0.2  Discontinuity  Cost 


As  in  [9],  we  use  a  depth  smoothness  or  first-order  prior  of  the  form 


0v(x.I)  =  £  £  eUj(xi-Xjf 


(5) 


i  jeJ\T(i) 


where  <yF(z)  are  the  horizontal  and  vertical  neighbours  of  i.  As  edge  strength  between 
nodes  we  use  an  exponentiated  L2  norm  of  the  difference  in  pixel  appearance 
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ehi 


(6) 


where  C;  is  the  RGB  colour  vector  of  pixel  i  and  Oj  is  a  tuning  parameter  (small  Oj 
increases  sensitivity  to  changes  in  the  image).  Equation  5  may  be  written  in  matrix 
form  as 


0,(x,I)  =  ||Sx||2  (7) 

where  each  row  of  S  represents  a  weighted  average  of  a  pair  of  adjacent  range  nodes. 


3.0.3  Smoothess/Curvature  Cost 

In  contrast  to  [9]  we  make  the  further  assumption  that  in  the  absence  of  cues  to  the  con¬ 
trary,  such  as  discontinuities  in  appearance,  the  gradient  of  surfaces  varies  smoothly. 
Under  this  second  order  smoothness  assumption,  given  a  neighbourhood  ^V{i)  of 
node  Xi  we  may  make  a  range  prediction  Xi  as  a  linear  combination  of  neighbouring 
ranges  xj  for  j  E  This  allows  us  to  write  simply 

x  =  Px  (8) 

where  P  is  a  suitably  formed  prediction  matrix.  We  define  curvature  cost  0c(x,I)  in 
the  form 


@c(x,I)  =  ||x-x||2  (9) 

=  ||(P-l)x||2  (10) 

Here,  1  is  the  identity  matrix.  While  details  of  how  P  is  created  will  be  postponed  until 
Section  4  we  may  proceed  by  understanding  this  cost  as  the  L2  norm  of  the  deviation 
of  x  from  the  prediction  based  on  modeling  surfaces  as  locally  continuous  and  smooth. 

3.1  Reduction  to  Ax  —  b 

We  may  further  expand  Equation  3  to  the  form 

0rf(x,z)  =xtWtWx-2ztWtWx  +  ztWtWz  (11) 

and  Equations  10  and  5  to 

0c(x,I)  =  xtRtRx,  ©s(x,I)  =  xTSTSx  (12) 


where  R  =  P  —  1. 

Substituting  Equations  11,12  into  1  and  solving  for  x  reduces  the  problem  to 


Ax  =  b 


(13) 
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with 

b  =  WTWz  (14) 

4  _  AiA2RTR  +  Ai(1-A2)StS+(1-Ai)WtW 

A“  1-Aj  (i5) 

Equations  13  to  15  imply  that  all  we  need  to  do  to  perform  the  optimization  is  to 
solve  a  large  sparse  linear  system. 

4  Constructing  The  Prediction  Matrix 


In  this  section  we  detail  how  the  prediction  matrix  P  is  created.  For  simplicity  we  show 
only  ID  cases  but  it  should  be  noted  that  P  contains  elements  to  penalise  curvature  in 
both  horizontal  and  vertical  directions. 

We  decompose  P  into  a  weighted  sum  of  three  prediction  operators  -  extrapolation 
from  left  and  right,  and  interpolation. 


P  =  WLPL  +  WMPM  +  W*  P*  (16) 

where  subscripts  L,M,7?  imply  left-extrapolation,  mean  (interpolation)  and  right- 
extrapolation  respectively.  The  W’s  are  suitably  constructed  weighting  matrices  de¬ 
rived  from  image  appearance  which  we  shall  expand  upon  shortly  in  Section  4.1.  The 
use  of  extrapolation  and  interpolation  can  be  understood  graphically  with  reference  to 
Fig.  1  which  shows  a  simplified  ID  case. 


left  extrapolate 


® 


@ 

interpolate  / 

© 

(gf 

•  (Xq)  weighted  prediction 

;  right  extrapolate  ; 

!  O 

o 

0  0  O  i 

Image  Pixels 

Fig.  1  Depth  prediction  via  weighted  interpolation  and  extrapolation  in  ID.  The  predictions  of  the 
range  xq  by  left  and  right  extrapolation  and  interpolation  are  shown  in  faded  grey.  The  discontinuity  in 
the  image  shown  at  the  bottom  of  the  figure  (each  range  node  has  a  single  pixel  attached  to  it)  causes 
the  left  extrapolation  to  be  down-weighted  —  the  image  edge  is  a  cue  for  a  possible  discontinuity  in 
range  between  node  jc_i  and  x0.  The  final  prediction,  xq  is  shown  in  the  center. 
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4.1  Anticipating  Depth  Discontinuities  from  Image  Cues 

The  image  can  be  used  to  provide  cues  about  the  behaviour  of  the  surface  we  hope 
to  reconstruct.  Our  basic  assumption  is  one  that  has  been  used  before  [9]  —  sharp 
changes  in  range  tend  to  appear  as  changes  in  appearance  (edges)  in  an  image.  We 
have  a  range  node  for  each  pixel  (see  Equation  16)  and  its  value  can  be  predicted  by 
a  weighted  sum  of  extrapolation  and  interpolation  from  its  neighbours.  We  describe 
only  the  horizontal  case  for  simplicity,  but  our  method  is  applied  in  the  vertical  case 
too.  For  each  node  X[  the  weighting  is  determined  by  the  properties  of  pixel  i  and  its 
neighbourhood.  Broadly  speaking,  if  a  pixel  is  identical  to  its  left  and  right  neighbours 
then  pure  interpolation  will  occur.  If  however  there  is  a  discontinuity  in  pixel  appear¬ 
ance  then  interpolation  will  be  down  weighted  and  either  left  or  right  extrapolation 
emphasised. 

To  explain  how  the  weighting  matrices  W l,m,r  are  created  we  shall  consider  the 
simple  ID  case  shown  in  Fig.  2.  Interpolation  is  preferable  to  extrapolation.  With  this 
preference  in  mind  and  considering  node  xo  in  Fig.  2,  we  can  write  the  importance 
weights  of  left  /  right  extrapolation  and  interpolation  as  m  r 

wm  =  e(_i}o)£(o,i)  (17) 

Wr  =  «(-2,-l)e(-l,0)(l  ~wm)  (18) 

wl  =  e(2, 1)«(1, 0)(!  ~wm)  (19) 

with  eij  as  defined  in  Equation  6.  The  above  relationships  can  be  understood  by  not¬ 
ing  that  if  the  pixel  attached  to  range  node  xo  is  identical  to  its  neighbours  (£(_10) 
and  £(q,i)  are  unity)  then  wm  =  1  and  wr  =  wi  =  0  -  interpolation  has  100%  of  the 
weighting.  As  the  pixels  I_i  and  Ii  become  increasingly  different,  the  left  and  right 
extrapolations  receive  more  weight.  In  the  limit,  if  two  pixels  are  entirely  different, 
the  edge  weight  between  them  tends  to  zero  and  the  attached  range  nodes  will  have 
no  direct  link  between  them.  It  does  not  make  the  two  nodes  independent  -  there  may 
be  other  dependencies  via  long  circuitous  routes  through  other  nodes.  It  does  however 
mean  that  range  discontinuities  across  this  boundary  are  not  penalised  because  the 
range  prediction  made  by  multiplication  by  P  is  based  on  an  extrapolation  from  one 
side  and  not  an  interpolation  across  the  discontinuity.  This  is  a  key  point  in  this  work. 


Fig.  2  A  ID  chain  of  range  nodes  (a  section  of  x)  and  the  edges  between  neighbours.  Considering 
xo,  right  extrapolation  uses  only  nodes  to  the  right  and  left  extrapolation  uses  the  two  left  hand  nodes. 
Interpolation  uses  nodes  x_i  and  x\.  The  edges  between  nodes  are  a  function  of  the  difference  in 
pixel  appearance  between  adjacent  range  nodes  (each  range  node  is  associated  with  a  single  pixel  in 
the  image). 
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5  Results 

Fig.  3  shows  the  results  of  processing  two  synthetic  scenes.  In  this  case  the  problem 
size  is  small  with  x  having  just  2500  elements  (each  element  of  x  corresponds  to  a 
vertex  in  the  mesh).  With  regard  to  the  “three  plane”  case  note  how  using  just  a  few 
laser  points  in  each  distinct  region  of  the  image  results  in  three  distinct  planes  being 
generated  in  the  reconstructed  scene.  The  strong  edges  in  the  images  prohibit  informa¬ 
tion  flow  between  planes.  For  the  nodes  at  the  very  edge  of  a  plane  the  extrapolation 
and  interpolation  weights  have  become  such  that  the  node  is  only  influenced  by  (cou¬ 
pled  to)  other  in-plane  nodes.  The  1st  order  method  alone  is  unable  to  reconstruct  the 
planes  correctly  as  it  tries  to  make  all  nodes  have  similar  ranges. 

In  the  case  of  the  “dome”  example  note  how  while  there  is  no  range  discontinuity 
there  is  a  sharp  discontinuity  in  surface  gradient  around  the  perimeter  of  the  dome. 
Note  also  that  the  first  order  smoothness  term  is  unable  to  reconstruct  the  curvature 
of  the  dome  in  the  absence  of  laser  measuremnents.  In  contrast,  with  a  second  order 
smoothness  cost  the  curved  shape  of  the  dome  is  recovered  well.  This  is  an  important 
result.  The  generated  curved  surface  is  the  smoothest  surface  that  can  explain  the 
existing  measurements  and  minimise  the  bust  in  second  order  smoothness  constraints 
implicit  in  P. 


(a)  Three  Planes 


(b)  1st  order  smoothness 


(c)  2nd  order  smoothness 


(d)  Dome  Image 


(e)  1st  order  smoothness 


(f)  2nd  order  smoothness 


Fig.  3  Synthetic  data  examples  which  highlight  important  aspects  of  our  approach.  Each  node  in 
the  mesh  represents  a  single  range  node  projected  out  from  an  image  pixel.  The  images  for  each  of 
the  two  cases  are  shown  on  the  left.  In  all  figures  sparse  laser  measurements  are  shown  in  red.  Note 
how  the  discontinuities  in  the  image  appear  as  discontinuities  in  the  reconstructed  surfaces.  First- 
order  smoothness  alone  tends  to  make  surfaces  have  the  same  depth  value  whereas  second-order 
smoothness  is  able  to  correctly  reproduce  both  planar  and  curved  surfaces. 


We  now  turn  to  processing  some  real  data.  We  used  a  nodding  SICK  LMS200  laser 
scanner  on  a  mobile  robot  to  capture  laser  data.  Images  were  captured  by  a  camera 
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mounted  above  the  laser  with  a  wide  angle  lens.  The  image  used  in  this  case  was  518 
by  259  pixels  resulting  in  some  134,162  range  nodes  and  is  shown  in  Fig.  4  with  laser 
measurements  projected  into  it.  For  scale,  the  target  is  approx  1.7m  wide.  The  recon¬ 
structed  model  is  shown  alongside.  Using  second-order  smoothness  alone  provides 
reasonable  results,  but  tends  to  introduce  ‘rippling’  artefacts  around  noisy  measure¬ 
ments.  A  small  amount  of  first-order  smoothness  is  necessary  to  damp  the  oscillations. 
Fig.  5  shows  points  of  interest  in  the  reconstruction.  We  show  an  outdoor  result  of  the 
same  problem  size  in  Fig.  6. 


Fig.  4  Results  from  an  indoor  dataset.  Image  and  laser  measurements  on  the  left,  and  the  recon¬ 
structed  model  on  the  right. 


Fig.  5  Details  of  a  reconstructed  scene  from  Fig.  4.  Note  the  detail  of  the  smooth  floor  and  inferred 
sharp  range  discontinuity  between  two  walls. 


The  algorithm  is  implemented  in  Matlab  and  the  linear  solve  is  performed  with 
Matlab’s  backslash  operator  (though  there  is  no  reason  not  to  use  another  method 
such  as  Conjugate  Gradient).  The  Three  Planes  case  and  the  Dome  case  in  Fig.  3,  with 
2,500  nodes  both  took  0.021  seconds  to  solve  in  a  single  iteration.  For  the  real  data 
case  in  Fig.  4  with  134,162  nodes,  the  algorithm  took  around  30  seconds  on  a  2Ghz 
dual  core  laptop. 

We  now  present  some  numerical  analysis  of  the  performance  of  our  approach.  It 
is  a  hard  task  to  obtain  a  ground  truth  geometry  for  the  complete  real  scene.  Instead 
of  comparing  pixel  ranges  to  ground  truth  we  compare  them  to  laser  measurements 
taken  of  the  scene  over  a  long  period  of  time  and  which  are  not  used  in  the  optimi¬ 
sation.  Concretely,  we  collect  a  very  dense  cloud  of  laser  data  at  the  scene  and  draw 
from  that  a  small  sparse  test  set  with  which  we  reconstruct  the  scene  shown  in  Fig.  4. 
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Fig.  6  Results  from  an  outdoor  dataset.  On  the  left  is  the  image  with  laser  measurements  overlaid. 
On  the  right  is  the  reconstructed  model. 


The  remaining  laser  data  constitutes  a  dense  hold  out  set,  and  for  each  unused  laser 
measurement  we  can  compare  measured  range  to  estimated  range.  Fig.  7(a)  shows 
regions  of  the  workspace  which  contain  pixels  with  significant  errors. 


Fig.  7  The  left  image  shows  a  comparison  of  range  estimates  to  ground  truth  laser  data  for  the  indoor 
case.  Areas  in  yellow  show  deviation  from  ground  truth,  with  higher  intensity  representing  larger 
errors.  Laser  measurements  are  shown  in  red.  The  graph  shows  average  error  of  the  estimate  relative 
to  the  mean  density  of  range  measurements,  when  compared  to  laser  measurements  in  the  hold  out 
set.  The  laser  has  a  precision  of  15mm. 


It  is  also  instructive  to  consider  how  the  accuracy  of  our  approach  depends  on 
the  density  of  laser  measurements.  Fig.  7(b)  shows  how  the  statistics  (mean  and  me¬ 
dian)  of  the  pixel  range  errors  change  as  a  function  of  measurement  density.  Note 
that  as  expected,  as  measurement  density  increases  the  precision  tends  to  that  of  the 
laser  itself  around  15mm.  The  results  given  in  Figs.  4  and  6  are  operating  in  the  0.01 
measurements/pixel2  region. 


6  Conclusion 

This  paper  has  introduced  a  novel  technique  for  fusing  sparse  laser  data  and  images 
to  enable  a  dense  3D  scene  reconstruction.  Above  and  beyond  existing  prior  work  this 


10 


Alastair  Harrison  and  Paul  Newman. 


technique  uses  a  second  order  smoothness  term  which  allows  it  to  extrapolate  both 
planar  and  curved  surfaces.  The  problem  is  formulated  as  the  solution  of  a  sparse 
linear  system,  which  allows  the  use  of  fast  optimization  techniques.  The  technique 
was  applied  to  both  illustrative  synthetic  cases  as  well  as  real  data  recorded  in  indoor 
and  outdoor  scenes  containing  challenging  geometry. 

The  qualitative  and  quantitative  results  presented  here  suggest  that  our  system  pro¬ 
vides  3D  reconstructions  of  reasonable  quality.  Nevertheless,  there  is  room  for  im¬ 
provement.  In  particular  we  must  consider  how  we  can  increase  robustness  to  erro¬ 
neous  laser  measurements  (away  from  image  edges)  and  how  we  might  fuse  multiple 
scenes  in  a  principled  way.  The  flip  side  of  this  problem  is  handling  bona-fide  discon¬ 
tinuities  in  range  when  there  is  no  change  in  image  appearance  and  vice  versa. 
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Relative  Motion  Threshold  for 
Rejection  in  ICP  Registration 


Francois  Pomerleau,  Francis  Colas,  Francois  Ferland  and  Francois  Michaud 


Abstract  Simultaneous  Localization  and  Mapping  (SLAM)  iteratively  builds 
a  map  of  the  environment  by  putting  each  new  observation  in  relation  with 
the  current  map.  This  relation  is  usually  done  by  scan  matching  algorithms 
such  as  Iterative  Closest  Point  (ICP)  where  two  sets  of  features  are  paired. 
However  as  ICP  is  sensitive  to  outliers,  methods  have  been  proposed  to  reject 
them.  In  this  article,  we  present  a  new  rejection  technique  called  Relative  Mo¬ 
tion  Threshold  (RMT).  In  combination  with  multiple  pairing  rejection,  RMT 
identifies  outliers  based  on  error  produced  by  paired  points  instead  of  a  dis¬ 
tance  measurement,  which  makes  it  more  applicable  to  point-to-plane  error. 
The  rejection  threshold  is  calculated  with  a  simulated  annealing  ratio  which 
follows  the  convergence  rate  of  the  algorithm.  Experiments  demonstrate  that 
RMT  performs  better  than  former  techniques  with  outliers  created  by  dy¬ 
namical  obstacles.  Those  results  were  achieved  without  reducing  convergence 
speed  of  the  overall  ICP  algorithm. 

Key  words:  ICP,  registration,  scan  matching,  rejection,  SLAM. 


1  Introduction 

Simultaneous  Localization  And  Mapping  (SLAM)  algorithms  use  motion  and 
observation  probabilistic  models  to  incrementally  correct  positioning  prob¬ 
lems.  The  mechanism  used  to  transform  different  observation  models  into  the 
same  coordinate  system  is  called  registration  (also  known  as  data  association 
or  scan  matching).  Proposed  SLAM  solutions  based  on  Maximum  Likelihood 
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(ML)  present  fast  capabilities  to  minimize  global  positioning  errors  [7],  but 
they  still  need  to  rely  on  efficient  and  robust  registration  algorithms  to  be 
stable  in  real  robotic  applications. 

Registration  can  be  done  using  landmarks  (e.g.,  lines,  circles,  arcs,  cor¬ 
ners)  [1].  When  applied  to  the  registration  processes,  landmarks  confer  the 
advantage  of  accelerating  calculation  by  summing  up  information.  However, 
landmark  registrations  can  be  sensitive  to  unstructured  environments  where 
landmarks  are  difficult  to  detect.  A  second  type  of  registration  is  called  Nor¬ 
mal  Distribution  Transform  (NDT).  NDT  segments  spatial  information  and 
works  on  the  first  and  second  statistical  moments  to  reduce  the  computa¬ 
tional  cost  [9]  while  avoiding  to  define  specific  landmarks.  However,  NDT 
is  still  very  sensitive  to  segmentation  because  large  spatial  cells  filter  out 
relevant  details,  whereas  small  cells  augment  the  computational  cost. 

Another  strategy  is  to  directly  use  point  clouds  derived  from  exteroceptive 
data.  One  technique  to  find  such  matches  is  known  as  Iterative  Closest  Point 
(ICP)  [2].  This  method  pairs  points  of  both  scans  by  finding  for  each  point 
of  the  first  scan  the  nearest  point  in  the  second  one.  From  these  pairs  a 
motion  vector  is  estimated  to  cope  for  their  misalignment.  This  process  is 
iterated  until  convergence.  ICP  variants  were  first  developed  for  applications 
involving  3D  model  reconstructions  [4],  [6],  [8].  When  used  in  SLAM  by  an 
autonomous  robot,  these  algorithms  need  to  be  adapted  in  several  ways:  1) 
they  must  work  in  real-time  [3];  2)  they  must  be  adapted  to  the  sensors  to 
be  able  to  use  2D  and  3D  spatial  information  [11],  and  to  cope  with  sensor 
fusion  (range,  laser  reflectivity  [14],  color  [13],  etc.);  3)  they  must  be  able  to 
deal  with  occlusion  and  partially  overlapping  scans  that  frequently  arise  in  a 
dynamic  environment  explored  by  a  mobile  platform. 

This  paper  addresses  the  issue  of  occlusions  and  partially  overlapping  scans 
by  using  a  new  adaptive  rejection  technique  called  Relative  Motion  Threshold 
(RMT).  For  SLAM,  changes  in  the  environment  and  occlusions  caused  by  the 
motion  of  the  mobile  robot  are  sources  of  outliers  (i.e.,  points  with  no  match). 
Fig.  1  presents  an  example  for  which  an  optimal  rotation  and  translation 
of  the  blue  point  cloud  (p^  must  be  applied  to  align  it  with  the  reference 
red  point  cloud  (qj),  with  i  and  j  being  point  indexes.  Even  after  a  small 
displacement  of  the  robot  between  t\  and  £2,  maps  can  largely  differ  due 
to  sensor  occlusion.  For  example,  in  Fig.  1(b),  50%  of  the  blue  points  are 
outliers  when  compared  to  the  red  point  cloud.  Also,  the  disambiguation  of 
points  obtained  from  obstacle  1  and  obstacle  2  must  be  resolved  to  make 
ICP  robust  to  initial  positioning  errors  and  overlapping.  Removing  outliers 
from  the  paired  points  is  done  during  the  rejection  step  of  ICP,  where  it 
minimizes  misalignment  errors  for  the  determination  of  the  motion  vector 
between  two  point  clouds.  The  rejection  technique  introduced  in  this  paper 
uses  an  adaptive  threshold  based  on  simulated  annealing  ratio  to  augment 
the  robustness  of  ICP  against  outliers. 

The  paper  is  organized  as  follows.  Section  2  presents  an  overview  of  rejec¬ 
tion  techniques.  Section  3  describes  the  RMT  rejection  technique  we  propose. 


Fig.  1  Map  registration  example  for  SLAM,  with  misalignment  caused  by  odometry  error. 
The  green  lines  represent  the  alignment  error  minimized  by  ICP  algorithms.  Obs.:  Obstacle, 
(a)  Two  laser  scans  taken  at  time  t\  and  £2-  (b)  Point  cloud  pi  (blue)  taken  at  £2  is 
represented  in  the  coordinate  system  of  point  cloud  qj  (red)  taken  at  £1 .  Black  arrows  are 
surface  orientations. 


Section  4  presents  experimental  results  that  evaluate  the  performance  of  each 
rejection  technique  in  terms  of  matching  and  convergence  rate  of  ICP  in  sim¬ 
ulated  and  real-world  applications. 


2  Rejection  Techniques 

Rejection  techniques  can  be  categorized  as  follows: 

•  Fix:  Manual  setting  of  the  maximum  distance  d  authorized  between  paired 
points.  Then,  all  paired  points  with  a  distance  higher  than  d  are  systemati¬ 
cally  rejected.  This  method  is  simple  but  does  not  adapt  well  to  conditions 
that  would  require  different  thresholds. 

•  Zhang :  Strategy  based  on  statistical  moments  of  the  distribution  of  the 
distances  between  the  paired  points  [15].  Four  conditions  are  needed  to 
adapt  the  threshold  d: 


'  /a  +  3<j,  if  0  <  p  <  77 
H  +  2cr,  if  T]  <  n  <  3/7 

d  =  (1) 

/i  T  cr,  if  377  <  p  <  677 

k  p,  otherwise 

where  p  is  the  median  of  the  distance  between  paired  points  and  77  is  a 
distance-based  parameter  set  by  the  user.  This  method  can  be  adjusted  to 
different  statistical  distributions  of  the  distances  but  still  lacks  generality. 
•  Mean :  Technique  proposed  in  [6]  which  sets  d  equal  pH- <7  for  each  iteration, 
and  where  p  and  <7  are  respectively  the  mean  and  the  standard  deviation 
of  the  distances  between  paired  points.  This  method  has  no  parameter 


and  is  flexible  to  different  type  of  outliers.  However  it  filters  outliers  on 
the  assumption  that  the  distribution  of  distances  is  Gaussian,  which  is  not 
a  good  assumption  in  the  case  of  a  dynamic  environment  where  several 
local  minima  can  emerge. 

Median :  An  other  statistical  method  proposed  by  [5]  fixes  d  to  3  times  the 
median  of  the  distances  between  paired  points.  The  method  has  no  param¬ 
eter,  but  calculating  a  median  over  a  huge  point  cloud  is  computationally 
expensive. 

Trim.:  Overlapping  parameter  £  defined  by  [4]  is  used  to  reject  a  percentage 
of  outliers: 


A trimmed  —  ^,-^total 


(2) 


where  N  is  the  number  of  paired  points.  This  approach  is  less  dependent  on 
the  shape  of  the  distribution.  However,  it  requires  to  sort  all  paired  points 
based  on  their  distances  at  each  iteration,  which  increases  computation 
time.  It  can  also  misled  by  a  large  change  in  the  overlap  that  may  occur 
with  a  moving  platform  due  to  occlusions. 


In  addition  to  these  techniques,  it  is  also  possible  to  reject  multiple  pairing 
to  a  single  point  [16],  [3],  as  shown  in  the  lower  right  part  of  Fig.  1(b)  where 
multiple  green  lines  connect  to  the  same  red  points.  Instead  of  authorizing 
all  pairs,  only  the  one  with  the  smallest  distance  is  kept.  This  criterion  has 
been  shown  to  improve  the  performance  of  all  standard  rejection  techniques, 
and  thus  all  results  presented  here  use  this  additional  criterion. 


3  Relative  Motion  Threshold  Technique  for  Rejection 

Existing  rejection  techniques  rely  mostly  on  the  Euclidean  distance  between 
paired  points.  While  this  distance  has  a  direct  impact  on  point-to-point  error 
metric,  ICP  implementations  commonly  use  a  point-to-plane  error  metric 
to  pair  the  points  because  of  its  faster  convergence  speed.  This  point-to- 
plane  error  metric1  assumes  that  there  is  a  local  surface  orientation  vector 
estimated  for  each  point  qj  and  projects  the  Euclidean  distance  between 
and  qj  on  this  vector.  The  point  paired  to  qj  then  minimizes  this  error  and 
not  the  Euclidean  distance  (see  Fig.  1(b)).  We  introduce  a  new,  more  general, 
rejection  technique  called  Relative  Motion  Threshold  (RMT).  RMT  is  an 
adaptive  rejection  technique  that  progressively  identifies  outliers  that  create 
most  of  the  error  during  the  process  of  ICP.  Adaptation  is  based  directly  on 
the  error  created  by  paired  points  instead  of  the  Euclidean  distance  between 
those  points  in  accordance  to  the  matching  process.  We  propose  to  reject  the 
outliers  with  a  maximum  authorized  error  et  at  iteration  £,  evaluated  by: 


1  In  the  remaining  of  the  text,  the  term  error  will  refer  to  the  point-to-plane  error  metric 
when  there  is  no  ambiguity. 


et  = 


(3) 


Aet-i,  if  A  <  1 
et_i,  otherwise 


with  A  being  a  simulated  annealing  ratio  defined  by: 


A  \\Tt-i\\ 

m-2\\ 


(4) 


This  ratio  uses  past  motion  information  to  determine  if  the  point  cloud 
is  converging  to  a  local  minima,  where  ||T||  is  the  Euclidean  norm  of  the 
translation  vector  T  which  minimize  the  alignment  error  of  pi  at  iteration  t. 
The  translation  vector  T  and  the  rotational  vector  Q  are  calculated  during 
the  Error  and  Minimization  step  at  the  end  of  each  iteration  and  are  used  to 
move  pi  toward  qi.  A  ratio  A  smaller  than  1  means  that  the  position  of  pi  is 
stabilizing.  All  points  with  a  translation  error  larger  than  et  +  e  are  identified 
as  outliers  and  rejected  during  the  iteration  t.  If  the  ratio  A  is  larger  than  1, 
the  motion  of  point  cloud  pi  is  accelerating  toward  qi  due  to  new  appearing 
constraints.  The  maximum  authorized  error  et  is  then  kept  stable  until  the 
point  cloud  starts  to  converge  again. 

The  minimum  error  e  is  the  only  parameter  needed  for  our  rejection  tech¬ 
nique.  It  represents  noises  from  sensor  readings.  A  simple  way  to  evaluate 
this  parameter  is  to  take  two  scans  of  a  static  environment  and  look  at  the 
distribution  of  translation  errors  created  by  the  error  metric  used.  This  trans¬ 
lation  error  should  be  centered  on  zero  and  can  be  estimated  by  a  Gaussian 
distribution.  The  parameter  e  can  be  estimated  using  the  standard  devia¬ 
tion  of  this  distribution,  making  the  parameter  sensor-dependent  instead  of 
situation-dependent . 


(a)  (b) 

Fig.  2  (a)  A  generic  example  of  maximal  authorized  error  based  on  RMT  in  function  of 
iterations.  Dashed  line  represents  minimum  error  e.  (b)  Final  position  of  the  registered 
point  clouds  used  in  the  generic  example. 


Fig.  2(a)  presents  an  example  of  the  relative  motion  threshold  in  function 
of  iterations.  During  the  two  first  iterations,  only  the  rejection  of  multiple 
pairings  is  active  to  initialize  a  value  for  Tt-\  and  Tt- 2-  Then,  e2  is  initialized 


to  the  maximum  Euclidean  norm  of  translation  error  at  iteration  2.  The  sim¬ 
ulated  annealing  ratio  reduces  this  error  until  the  point  cloud  Pi  temporarily 
stops  converging.  The  ratio  A  is  higher  than  1  between  iteration  6  and  7, 
forcing  a  constant  error  threshold  until  the  point  cloud  Pi  start  converging 
again.  The  final  threshold  error  is  reduced  until  iteration  11  where  it  equals 
the  minimum  error  e  represented  by  the  dashed  line.  Fig.  2(b)  shows  the 
two  point  clouds  used  in  this  example  at  iteration  11.  The  point  clouds  were 
taken  in  a  room  where  two  boxes  were  moved  in  different  directions.  The 
thick  points  in  light  blue  represent  pi  which  converges  to  the  right  position 
even  with  outliers  caused  by  dynamic  obstacles  (i.e.,  boxes).  Small  black  cross 
represent  outliers  detected  by  RMT. 


4  Experimental  Results 

Experiments  were  conducted  in  simulation  and  real  settings  using  the  follow¬ 
ing  ICP  algorithm  consisting  in  five  steps  [12]: 

1.  Selection  reduces  the  number  of  points  in  pi  by  selecting  a  representative 
subset  of  points  ps,  where  s  <  i.  This  step  is  a  compromise  between  com¬ 
putation  speed  and  robustness.  Even  if  using  a  smaller  number  of  points 
results  in  faster  computation  time,  the  result  may  diverge  if  not  enough 
points  are  used,  or  if  the  selection  process  filters  out  necessary  constraints. 

2.  Matching  pairs  each  point  of  ps  in  the  point  cloud  qj.  This  corresponds 
to  a  closest  point  search  problem.  One  data  structure  often  used  to  solve 
this  problem  is  the  k-d  tree.  It  is  a  data  structure  that  partitions  the  space 
into  k  dimensions,  with  the  property  of  accelerating  the  nearest  neighbor 
search.  Recently,  utilization  of  the  approximate  k-d  tree  [8]  has  shown  to 
give  faster  results  without  altering  ICP  precision. 

3.  Weighting  improves  or  reduces  impacts  of  pairing  point  on  the  error  ma¬ 
trix  by  using  criteria  such  as  distance,  normal  compatibilities  and  scanner 
noise.  However,  results  suggest  that  weighting  is  data-dependent  and  does 
not  increase  convergence  rate  significantly  [12]. 

4.  Rejection  uses  techniques  described  in  Section  2  and  3. 

5.  Error  and  minimization  use  all  the  remaining  matched  points  to  evaluate 
the  misalignment  error  and  a  create  a  motion  vector  m  =  [T,  ft]'  minimiz¬ 
ing  this  error  where  T  is  the  translation  components  and  Q  the  orientation 
components.  This  motion  vector  is  applied  to  the  point  cloud  pi.  Point- 
to-plane  error  function  is  shown  to  have  a  faster  convergence  rate  than 
point-to-point  error  [12]. 

Steps  2  to  5  are  repeated  until  any  of  the  ending  condition  is  reached. 

Several  ending  conditions  have  been  proposed,  e.g.  number  of  iterations,  er¬ 
ror,  relative  motion  between  two  iterations  [4],  [16],  stabilization  of  mean 


and  standard  deviation  of  the  distances  between  paired  points,  number  of 
registered  pairs  [6].  Our  complete  ICP  algorithm  uses  all  of  those. 


J^.l  Evaluation  method 

To  test  the  RMT  rejection  technique  while  dealing  with  outliers  or  occlusion 
caused  by  moving  objects,  we  enriched  the  test  protocol  described  in  [10]. 
More  specifically,  we  recorded  data  taken  by  a  SICK  LMS  200  laser  range 
finder  in  a  U-shaped  room.  Without  moving  the  sensor,  we  added  or  moved 
boxes  in  its  field  of  view.  This  way  we  generated  10  pairs  of  different  scans 
with  an  overlapping  ratio  around  75%.  For  each  trial,  one  of  the  two  scans 
was  transformed  with  a  rotation  and  a  translation  vector  drawn  randomly 
according  to  a  Gaussian  distribution  in  order  to  fit  the  uncertainty  of  the  lo¬ 
calization  of  standard  SLAM  techniques.  The  standard  deviations  were  0.15  m 
for  each  translation  component  and  0.15  rad  for  the  angle.  Fig.  3(a)  shows  an 
example  of  two  scans  as  well  as  the  distribution  of  displacement  of  the  second 
scan.  We  can  see  that  in  the  second  scan,  one  of  the  boxes  moved  while  the 
other  was  removed.  As  the  sensor  is  fixed  between  the  scans,  the  result  of  the 
registration  algorithm  is  exactly  the  inverse  of  this  transformation.  Fig.  3(b) 
shows  the  results  for  each  rejection  techniques.  Curves  represent  the  mean  of 
the  XY  alignment  error  of  pi  over  4000  trials.  For  rejection  techniques  that 
require  the  setting  of  parameters,  optimal  values  were  derived  by  sampling 
the  parameter  space  and  computing  the  percentage  of  good  registration  over 
a  training  set.  The  final  performances  were  evaluated  using  the  remaining 
configurations.  Theses  optimal  parameters  are  presented  in  Table  1.  RMT 
provides  a  large  improvement  over  the  other  rejection  techniques,  since  other 
methods  tend  to  wrongly  categorize  points  as  outliers  and  converge  towards 
local  minima. 


Fig.  3  (a)  Two  scans  to  be  matched.  The  first  scan  is  in  red,  whereas  the  second  scan, 
after  translation  and  rotation,  is  in  thick  blue.  The  green  crosses  show  the  distribution  of 
displacement  error  used  in  our  test  with  a  standard  deviation  of  0.15  on  x-  and  y-axis.  (b) 
Comparison  of  the  performance  of  several  rejection  methods  in  function  of  iterations.  The 
performance  is  measured  by  the  mean  position  error  in  respect  to  ground  truth. 


|  ||  RMT 

|  Median 

|Trim. 

|Mean 

|Fix 

| Zhang  | 

|  Parameter  1 1  e  =  0.05 

|none 

\i  =  76% 

|none 

|  d  =  0.3  m 

1 77  =  0.02  m  | 

Table  1  Parameters  used  for  each  rejection  techniques  during  comparison  test. 


In  terms  of  speed,  ICP  is  an  iterative  algorithm  known  to  converge  in  a 
small  number  of  iterations.  RMT  rejection  method  does  not  impair  the  con¬ 
vergence  rate  of  the  matching  algorithm.  The  mean  and  covariance  on  the 
number  of  iterations  for  the  Median  and  RMT  are  respectively  (/x  =  9.5, 
a  =  2.4)  and  (/x  =  9.9,  a  =  1.9).  Those  results  were  obtained  while  keep¬ 
ing  registration  converging  to  the  right  value  for  4000  trials.  No  significant 
difference  were  observed  between  all  rejection  techniques  tested. 

Looking  at  how  rejection  techniques  perform  with  shifting  initial  positions, 
Table  2  presents  the  correct  registration  computed  for  4000  trials  of  each 
rejection  technique  for  various  standard  deviations  on  the  initial  error.  RMT 
rejection  technique  performs  better  than  the  others  for  all  conditions  tested 
by  having  the  lowest  residual  error.  Moreover,  a  rise  in  the  initial  position 
variance  decreases  the  performance  for  every  methods,  as  expected.  For  this 
setup,  it  means  that  uncertainty  on  the  position  of  the  robot  should  be  kept 
under  15  cm  before  applying  ICP  to  achieve  good  registration.  However,  RMT 
rejection  method  is  more  robust  than  other  techniques  as  performance  loss 
occurs  at  a  higher  variance  while  being  less  computationally  expensive  than 
median  technique. 


Std 

RMT 

Median 

Trim. 

Mean 

Fix 

Zhang 

0.05 

0.001 

0.005 

0.013 

0.009 

0.027 

0.021 

0.10 

0.004 

0.016 

0.015 

0.013 

0.029 

0.056 

0.15 

0.021 

0.031 

0.032 

0.039 

0.055 

0.107 

0.20 

0.034 

0.051 

0.060 

0.059 

0.077 

0.132 

0.30 

0.135 

0.145 

0.189 

0.203 

0.261 

0.282 

0.40 

0.272 

0.294 

0.356 

0.362 

0.412 

0.423 

Table  2  Robustness  of  the  rejection  techniques  with  respect  to  initial  error.  The  perfor¬ 
mance  is  measured  in  term  of  the  mean  final  of  XY  alignment  error  of  point  clouds  (in 
meter). 


4-2  Real-world  application 

The  last  section  described  experiments  with  outliers  mainly  due  to  dynamical 
obstacles.  Another  main  source  of  outliers  can  be  created  by  low  overlapping 
percentage  of  scans.  The  Canadian  Space  Agency  (CSA)  uses  a  rotating  laser 
range  finder  installed  on  a  robot  to  test  Mars  exploration  algorithms  which 
is  showed  in  Fig.  4  (a).  The  robot  typically  moves  few  meters  on  a  simulated 
Martian  terrain,  takes  a  3D  scan  and  decides  where  to  go  next.  In  this  kind 


of  application,  overlapping  between  scans  can  vary  between  50%  and  90% 
and  few  3D  features  are  available,  making  the  registration  very  sensible  to 
outliers.  The  RMT  was  used  on  3D  point  clouds  extracted  within  this  context 
of  application.  Fig.  4  presents  the  result  of  one  registration  with  a  distance  of 
15  m  between  the  two  scans.  The  grayscale  surface  correspond  to  the  section 
used  for  the  registration.  The  maximum  height  of  the  surface  is  about  1  m. 
Points  on  both  side  of  the  surface  represent  outliers  removed  during  the  regis¬ 
tration.  This  demonstrates  that  the  RMT  can  also  deal  with  outliers  created 
by  low  overlapping  scans.  Moreover,  the  algorithm  is  currently  used  by  the 
Space  Technologies  Research  Program  of  the  CSA  for  complete  mapping  of 
the  experimental  Martian  terrain. 


(a)  (b) 


Fig.  4  (a)  Robot  and  environment  of  the  Canadian  Space  Agency  for  the  Mars  exploration 
project,  (b)  RMT  applied  to  scans  with  low  overlapping.  In  grayscale,  the  surface  recovered 
from  match  points.  In  red  and  blue,  the  outliers  of  each  scan. 


5  Conclusion  and  Future  Work 

This  paper  presents  a  novel  rejection  technique  called  RMT  in  the  context  of 
ICP  registration  applied  to  SLAM  in  mobile  robotics.  Results  show  promising 
performance,  making  RMT  a  very  interesting  alternative  to  other  rejection 
techniques.  In  particular,  RMT  allows  better  registration  with  point  clouds 
containing  dynamical  obstacles.  RMT  also  demonstrates  its  applicability  in 
a  Mars  exploration  context  with  low  overlapping  percentages.  It  also  gives 
good  results  for  identifying  dynamical  obstacles.  In  future  work,  we  plan  to 
characterize  the  stability  of  the  approach  in  a  complete  SLAM  algorithm, 
and  to  further  extend  the  range  of  initial  error  that  ICP  can  resolve. 
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Rover-Based  Surface  and  Subsurface  Modeling 
for  Planetary  Exploration 
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Abstract  We  develop  and  test  a  technique  for  the  creation  of  coupled  surface  and 
subsurface  models.  Images  from  a  stereo  camera  are  used  to  estimate  the  motion 
of  a  rover  that  is  collecting  ground  penetrating  radar  (GPR)  data.  The  motion  es¬ 
timate  and  raw  sensor  data  are  used  to  build  two  novel  data  products:  (1)  A  three- 
dimensional,  photorealistic  surface  model  coupled  with  a  ribbon  of  GPR  data,  and 
(2)  a  two-dimensional,  topography-corrected  GPR  radargram  with  the  reconstructed 
surface  topography  plotted  above.  Each  result  is  derived  from  only  the  onboard  sen¬ 
sors  of  the  rover,  as  would  be  required  in  a  planetary  exploration  setting.  These 
techniques  were  tested  using  data  collected  in  a  Mars  analogue  environment  on  De¬ 
von  Island  in  the  Canadian  High  Arctic.  GPR  transects  were  gathered  over  polygo¬ 
nal  patterned  ground  similar  to  that  seen  by  the  Phoenix  lander  on  Mars.  Using  the 
techniques  developed  here,  scientists  may  remotely  explore  the  interaction  of  the 
surface  topography  and  subsurface  structure  as  if  they  were  on  site. 


1  Introduction 

The  use  of  ground  penetrating  radar  (GPR)  together  with  a  stereo  camera  on  plan¬ 
etary  exploration  rovers  has  been  proposed  several  times  [1,7]  and  is  now  in  de¬ 
velopment  for  the  European  Space  Agency’s  (ESA)  ExoMars  project  (2014)  [22]. 
Used  together,  surface  and  subsurface  imaging  will  aid  in  the  search  for  liquid  wa¬ 
ter  and  evidence  of  life.  The  ESA  mission  proposes  using  the  stereo  camera  for  site 
selection  and  survey,  while  the  GPR  will  then  be  used  to  characterize  the  subsurface 
stratigraphy,  and  to  select  sites  for  drilling. 
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Despite  this  interest,  there  are  still  several  open  issues  regarding  the  use  of  GPR 
on  a  rover  platform: 

1 .  Rovers  must  be  able  to  deliver  information  about  the  surface  (topography,  sub¬ 
strate  particle  size  distribution,  and/or  the  presence  of  any  existing  outcrops)  that 
enables  the  operator  to  give  local  geologic  context  to  the  subsurface  data.  The 
location  of  the  GPR  traverse  must  be  known  with  respect  to  the  surface  data  cap¬ 
tured  by  the  rover,  so  that  the  scientific  interpretation  of  the  data  is  as  close  as 
possible  to  a  direct  (human)  site  survey. 

2.  For  a  more  complete  interpretation  of  GPR  data,  the  radargram  (i.e.,  the  two- 
dimensional  subsurface  profile)  should  be  corrected  for  topography  (e.g.,  [15]). 
As  planetary  exploration  rovers  have  no  access  to  a  global  positioning  system 
(GPS)  equivalent,  topographic  profiles  must  be  generated  using  other  onboard 
sensors. 

3.  A  flight-ready  GPR  antenna  must  satisfy  size,  mass  and  power  consumption  con¬ 
straints  and  the  integration  must  minimize  interference  from  the  rover’s  motors 
and  metal  chassis. 

This  paper  addresses  items  1  and  2  by  using  stereo  imagery  to  enhance  the  GPR 
data.  Stereo  cameras  have  been  deployed  on  the  Mars  Exploration  Rovers  and  are 
planned  for  both  the  Mars  Science  Laboratory  (2011)  and  the  ExoMars  Mission 
(2014)  [22].  Visual  odometry  (VO)[17,  20,  3,  11] — full  6-degree-of-freedom  mo¬ 
tion  estimation  using  a  stereo  camera  as  the  primary  sensor — is  central  to  the  work 
described  in  this  paper.  Our  visual  odometry  algorithm  produces  motion  estimates 
with  accuracy  between  0.5%  and  5.3%  of  distance  traveled. 

On  Earth,  producing  a  site  survey  using  GPR  on  rough  terrain  involves  several 
manual  steps: 

1.  Place  fiduciary  markers  (e.g.,  flags)  along  the  intended  transect  and  survey  their 
locations  (e.g.,  using  differential  global  positioning  (DGPS)). 

2.  Drag  the  antenna  along  the  transect  at  a  constant  speed  to  collect  many  GPR 
traces,  manually  inserting  a  mark  into  the  data  to  note  the  time  at  which  the 
antenna  passes  each  fiduciary  marker. 

3.  Linearly  interpolate  these  manually-generated  markers  to  correct  the  horizontal 
spacing  of  the  GPR  traces  along  the  transect. 

4.  If  the  surface  is  not  flat,  correct  the  vertical  offset  of  the  GPR  traces  using  surface 
topography  manually  collected  with  a  DGPS  (Step  1). 

5.  Concatenate  the  corrected  traces  into  a  raster  image  called  a  radargram. 

Our  technique  uses  a  VO  estimate  to  fully  automate  this  procedure.  Further,  we  pro¬ 
duce  two  novel  data  products  that  may  be  used  to  explore  the  interaction  of  surface 
topography  and  subsurface  structure:  (1)  A  three-dimensional,  photorealistic  surface 
model  coupled  with  a  ribbon  of  GPR  data,  and  (2)  a  two-dimensional,  topography- 
corrected  GPR  radargram  with  the  reconstructed  surface  topography  plotted  above. 

These  techniques  have  been  tested  using  data  gathered  at  two  sites  near  the 
Haughton-Mars  Project  Research  Station  (HMPRS)  on  Devon  Island,  Nunavut, 
Canada.  The  sites  exhibit  polygonally  patterned  ground,  a  periglacial  landform  often 
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indicative  of  subsurface  ice  deposits  [16].  Stereo  images  were  captured  during  GPR 
transects  and  our  integrated  surface/subsurface  modeling  techniques  were  applied 
to  the  resulting  data. 

The  rest  of  the  paper  is  organized  as  follows.  Our  coupled  surface  and  subsurface 
modeling  system  is  described  in  Section  2.  Sections  3  and  4  outline  our  field  tests  on 
Devon  Island  and  the  associated  results.  Our  conclusions  are  provided  in  Section  5. 


2  Integrated  Surface  and  Subsurface  Modeling 


Fig.  1  An  overview  of  the  major  processing  blocks  of  our  system. 


This  section  will  describe  our  integrated  surface/subsurface  modeling  system. 
Data  flow  through  the  main  processing  blocks  of  our  system  can  be  seen  in  Fig¬ 
ure  1 .  The  images  captured  from  a  calibrated  stereo  camera  are  first  undistorted  and 
rectified.  This  process  accounts  for  lens  distortion  and  aligns  the  images  as  if  they 
came  from  perfect  pinhole  cameras  with  parallel  optical  axes. 

Our  algorithm  uses  Speeded-Up  Robust  Features  (SURF) — an  algorithm  to 
detect  and  describe  scale-and-rotation-invariant  features  [4] — for  both  matching 
(across  stereo  pairs)  and  tracking  (over  time).  This  is  a  class  of  feature  pioneered 
by  Lowe  [14].  Lowe’s  Scale-Invariant  Feature  Transform  (SIFT)  algorithm  has  been 
used  previously  for  object  recognition  [14],  simultaneous  localization  and  mapping 
[6],  and  visual  odometry  [3].  SURF  is  a  similar  algorithm  that  is  much  faster  to  com¬ 
pute  because  it  uses  integral  images  to  approximate  the  operations  used  by  SIFT  to 
find  and  describe  features.  After  two  consecutive  stereo  pairs  have  been  matched, 
features  are  tracked  between  frames.  Feature  descriptor  matches  between  the  con¬ 
secutive  left  images  are  used  as  candidate  tracks.  We  use  a  version  of  RANSAC  [5] 
to  simultaneously  reject  outlier  feature  tracks  and  produce  a  coarse  motion  estimate 
that  is  used  to  initialize  our  maximum  likelihood  solution. 

Our  maximum  likelihood  solution  is  similar  to  the  one  developed  by  Matthies 
[18]  and  deployed  on  the  Mars  Exploration  Rovers  [17].  At  each  timestep,  N  tracked 
features  pass  the  outlier  rejection  stage.  For  each  feature  i,  we  triangulate  the  three- 
dimensional  location  of  the  point  in  each  of  the  two  consecutive  stereo  images.  This 
results  in  a  pair  of  points  p^1  and  pj>2.  As  the  world  is  assumed  to  be  rigid,  we  now 


4 


Paul  Furgale,  Tim  Barfoot,  and  Nadeem  Ghafoor 


seek  the  rotation  C12  and  translation  p^1  that  align  these  two  point  clouds.  This 
results  in  the  objective  function,  /,  which  we  seek  to  minimize: 

J(Cl2,pf)  :=  (Pi1  -  (C12P f +  P?1))rW;-  (pi1  -  (CnpZ  +  pj1))  (D 

Z  i=  1 

where  W )  is  a  weighting  matrix.  We  use  the  inverse  covariance  of  :=  p*/  — 
(C12P22  +  p21)  for  W i,  and  thus  /  is  a  Mahalonabis  distance,  and  finding  the  vari¬ 
ables  that  minimize  J  also  maximizes  the  joint  likelihood  of  all  the  data.  For  further 
details,  please  refer  to  [8]. 

The  motion  estimates  between  each  consecutive  pair  of  images  are  then  stacked 
up  to  give  an  estimate  of  the  rover’s  entire  traverse.  As  the  robot  is  rigid,  we  obtain 
the  transformation  from  the  camera  frame  to  the  GPR  frame  through  calibration, 
and  so  the  visual  odometry  estimate  also  gives  us  the  position  of  the  GPR  at  each 
point  along  the  traverse.  Knowing  the  position  of  the  camera  and  the  GPR,  we  can 
transform  all  of  the  raw  data  into  a  common  coordinate  frame,  This  gives  us 
the  following  intermediate  data  products,  all  expressed  in  J^o- 

1.  an  estimate  of  the  rover’s  position  for  each  stereo  image, 

2.  the  sparse  points  used  to  compute  the  motion  estimate, 

3.  larger  point  clouds  for  each  stereo  image  obtained  from  dense  stereo  processing, 

4.  the  position  of  the  GPR  at  each  data  collection  point. 

These  intermediate  results  are  used  to  build  the  higher-level  data  products  described 
below. 


2.1  Three-Dimensional  Surface  and  Subsurface  Model 


The  first  data  product  is  a  photorealistic,  three-dimensional  model  of  the  surface, 
coupled  with  a  model  of  the  subsurface.  Point  sets  derived  from  dense-stereo  pro¬ 
cessing  are  aligned  using  the  VO  motion  estimate  [21].  The  resulting  point  cloud 
is  meshed  and  mapped  with  texture  from  the  original  images  [2].  The  GPR  scan 
is  modeled  as  a  ribbon  running  under  the  surface  mesh.  The  known  transforma¬ 
tion  between  the  stereo  camera  and  the  GPR  antenna  is  used  to  couple  the  surface 
and  subsurface  models.  The  resulting  coupled  model  allows  geologists  to  inspect  a 
three-dimensional  representation  of  the  transect  and  explore  the  interaction  of  the 
surface  morphology  and  the  subsurface  scan. 


2.2  Two-Dimensional  Topography-Corrected  Radargram 

The  second  data  product  is  a  two-dimensional,  topography  corrected  radargram.  The 
position  and  attitude  of  the  antenna  at  each  GPR  trace  is  interpolated  from  the  VO 
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Distance  along  transect  (m) 

Fig.  2  Sparse  surface  points  and  the  spline  fit  along  the  GPR  antenna’s  path.  This  section  is  a 
polygon  trough  from  transect poly-2AS-l. 


estimate.  This  estimate  is  used  in  place  of  the  DGPS  survey  to  perform  both  the 
horizontal  correction  and  the  vertical  correction.  The  profile  of  the  surface  below 
the  antenna  is  estimated  by  fitting  a  spline  to  feature  locations  along  the  transect  as 
shown  in  Figure  2.  The  spline  improves  on  the  topographic  correction  as  it  is  able 
to  capture  narrow  features  over  which  the  rover  may  drive. 


3  Field  testing 

The  experiments  described  in  this  paper  were  conducted  on  Devon  Island  in  the 
Canadian  High  Arctic,  as  part  of  the  Haughton-Mars  Project.  The  Haughton-Mars 
Project  Research  Station  (HMPRS)  is  situated  just  outside  the  northwest  area  of 
the  Haughton  impact  crater,  which  is  located  at  75°22'  N  latitude  and  89°41/  W 
longitude.  Our  experiments  were  conducted  approximately  10  kilometers  northeast 
of  HMPRS  near  Lake  Orbiter.  This  site  was  selected  based  on  ongoing  research  into 
the  polygonal  terrain  it  hosts.  Image  sequences  from  the  stereo  camera  and  GPR 
data  were  logged  at  two  sites: 

1.  The  Lake  Orbiter  Transects:  Five  straight-line  transects  were  taken  at  the  Lake 
Orbiter  site  (Figure  3(a)).  Each  transect  is  approximately  60  meters  long. 

2.  The  Mock  Rover  Transect:  One  transect,  approximately  357  meters  long,  at  a  site 
that  had  not  been  previously  studied  (Figure  3(b)). 

In  our  experiments,  a  rover  was  simulated  using  a  pushcart  equipped  with  rover 
engineering  sensors  (i.e.,  stereo  camera,  inclinometers,  sun  sensor,  wheel  odome¬ 
ters),  a  ground-penetrating  radar,  an  on-board  computer,  and  two  independent  GPS 
systems  (one  Real-Time  Kinematic)  used  for  ground-truth  positioning  (see  Fig¬ 
ure  4).  Although  this  was  not  an  actuated  rover,  our  focus  in  this  work  is  on  prob¬ 
lems  of  estimation,  and  thus  it  was  entirely  sufficient  as  a  means  to  gather  data. 
The  GPR  (and  cart)  we  used  was  a  Sensors&Software  Noggin  250  MHz  system  [1]. 
Efforts  were  made  to  minimize  the  effect  of  the  rover  body  on  the  GPR  data  qual¬ 
ity  (e.g.,  using  plastic  parts  where  possible).  The  stereo  camera  was  a  Point  Gray 
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(a)  The  Lake  Orbiter  Transects.  (b)  The  Mock  Rover  Transect. 

15°29'35"N,  8 9°52'51"W.  15°2S'56"N,  89°52,11"W. 

Fig.  3  Locations  and  transects  on  Devon  Island,  Nunavut,  Canada  used  for  field  testing  our  inte¬ 
grated  surface/subsurface  modeling  technique. 


Research  Bumblebee  XB3  with  a  24  cm  baseline  and  70°  field  of  view,  mounted  ap¬ 
proximately  1  m  above  the  surface  pointing  downward  by  approximately  20° .  Each 
image  of  the  stereo  pair  was  captured  at  1280  x  960  pixel  resolution. 


Fig.  4  The  rover  platform 
used  for  field  testing. 


4  Results 


The  visual  odometry  algorithm  described  in  Section  2  was  used  to  process  all  data 
collected  at  the  Lake  Orbiter  site.  We  used  a  real-time  kinematic  GPS  unit  as  ground- 
truth  for  our  motion  estimate.  We  determine  the  initial  heading  through  a  least- 
squares  fit  of  the  estimated  track  to  the  GPS  for  a  small  number  of  poses  at  the  start 
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Fig.  5  Track  plots  of  GPS  and 
the  VO  estimate  for  the  357 
meter  Mock  Rover  Transect. 


Table  1  Visual  Odometry  motion  estimate  results. 

Transect  Distance  Traveled  (m)  Linear  Error  (m)  Percent  Error  Number  of  Images 


Mock  Rover 

357.30 

5.83 

1.63% 

4818 

poly- IAS- 1 

54.23 

2.87 

5.29% 

333 

poly-IBN-l 

59.63 

0.68 

1.14% 

316 

poly-lBS-1 

60.06 

1.24 

2.06% 

317 

poly-ICN-l 

60.67 

1.98 

3.26% 

327 

poly-2  AN- 1 

51.49 

1.04 

2.01% 

270 

poly-2AS-l 

50.16 

0.25 

0.51% 

263 

poly-2BN-l 

49.47 

1.16 

2.34% 

260 

poly-2BS-l 

49.05 

0.96 

1.96% 

258 

of  the  traverse.  These  poses  are  then  discarded  and  are  not  used  when  evaluating  the 
linear  position  error.  This  is  similar  to  the  method  used  by  [19]. 

The  results  are  shown  in  Table  1 ,  which  lists  the  distance  traveled  and  errors  for 
all  datasets  collected.  On  the  short  Lake  Orbiter  transects  (50  to  60  meters),  position 
errors  ranged  from  0.5%  to  5.3%  of  distance  traveled.  The  results  of  the  estimation 
on  the  Mock  Rover  Transect  are  plotted  in  Figure  4.  This  estimate  accumulated 
1.63%  position  error  over  this  357.3  meter  traverse. 

Our  results  approach  those  reported  by  other  frame-to-frame  VO  algorithms 
[10,  20],  and  we  believe  this  class  of  algorithm  is  suitable  for  applications  such  as 
this,  which  require  a  motion  estimate  over  a  short  distances.  To  use  VO  over  longer 
distances,  the  work  of  Konolige  et  al.  offers  insights  into  increasing  performance, 
albeit  at  the  cost  of  increased  computational  complexity  [11]. 


4.1  Coupled  Surface/Subsurface  models 

The  complete  coupled  surface/subsurface  model  of  the  Mock  Rover  Transect  is 
shown  in  Figure  6.  The  texture-mapped  triangle  mesh  of  the  surface  is  displayed 
above  the  ribbon  of  GPR  data.  The  model  may  be  inspected  using  a  Virtual  Reality 
Modeling  Language  viewer  and  rendered  from  any  viewpoint. 
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Trough  closeup  (side  view)  Trough  closeup  (top  view) 


Fig.  6  Screenshots  of  the  coupled  surface/subsurface  model. 


Distance  along  transect  (m) 


Fig.  7  GPR  transect  corrected  for  topography  with  surface  profile  plotted  above. 


Polygonal  terrain — a  network  of  interconnected  trough-like  depressions  in  the 
ground — is  a  landform  commonly  found  throughout  the  polar  regions  of  both  Earth 
[13]  and  Mars  [12].  In  terrestrial  environments,  these  features  are  often  indicative 
of  subsurface  ice  bodies  termed  ice  wedges  [16].  As  noted  by  Hinkel  et  al.  [9], 
ice  wedges  “produce  exceedingly  complex,  high  amplitude  hyperbolic  reflections” 
(p.187)  due  to  the  conical  shape  of  the  emitted  GPR  pulse.  As  a  result,  while  ice 
wedges  themselves  are  roughly  triangular  in  shape — wider  at  the  top  and  progres¬ 
sively  narrowing  with  depth — their  appearance  on  a  radargram  more  resembles  an 
inverted  hyperbola. 

Figure  7  shows  the  entire  corrected  GPR  radargram  produced  from  data  collected 
at  the  Mock  Rover  site.  Points  A-C  illustrate  three  such  examples  of  hyperbolic  sub¬ 
surface  reflections  detected  within  the  radargram.  At  these  and  other  locations  along 
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the  transect,  the  hyperbolic  reflectors  are  found  immediately  beneath  the  troughs 
as  indicated  by  small  V-shaped  depressions  in  the  stereo  camera  surface  profile. 
Because  polygon  troughs  are  the  most  obvious  surface  expression  of  ice  wedge 
locations  [16],  the  successful  coupling  of  our  surface/subsurface  model  is  further 
supported. 


5  Conclusion  and  Future  Work 

We  have  presented  a  coupled  surface/subsurface  modeling  method  for  planetary 
exploration.  Our  method  uses  only  a  stereo  camera  and  a  ground  penetrating  radar 
unit  to  produce: 

1.  An  estimate  of  the  rover’s  trajectory  over  the  course  of  the  traverse. 

2.  A  photorealistic  three-dimensional  surface/subsurface  model. 

3.  Topography-corrected  GPR  traces  plotted  with  a  two-dimensional  profile  of  the 
surface  along  the  transect. 

The  models  and  corrections  allow  operators  to  work  remotely,  surveying  the 
data  as  if  they  were  on  site.  Currently,  there  are  several  manual  steps  in  terres¬ 
trial  GPR  site  surveying.  Our  approach  allows  GPR  collection  to  be  carried  out  in 
an  automated  manner,  on  a  rover,  thereby  enabling  planetary  exploration.  Subsur¬ 
face  stratigraphy  can  be  examined  in  the  context  of  the  surface  morphology,  a  key 
scientific  technique  used  by  field  geologists  to  identify  sites  worthy  of  further  study. 

We  collected  our  data  in  a  Mars  analogue  environment  at  sites  of  scientific  in¬ 
terest.  Visual  odometry  estimates  were  produced  for  approximately  800  meters  of 
traverse.  The  coupled  models  were  generated  using  only  the  types  of  sensors  that 
are  slated  to  fly  on  future  rover  missions,  such  as  ExoMars. 

As  mentioned  in  Section  1,  a  flight  rover’s  metal  chassis  and  interference  from 
the  motors  may  corrupt  the  GPR  signal.  Our  future  work  will  include  a  return  to 
Devon  Island  in  July,  2009  to  address  this  issue  by  direct  comparison  of  GPR  data 
collected  with  and  without  an  actuated  rover  platform. 
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Abstract  User  guidance  systems  are  relevant  to  various  applications  of  the  service 
robotics  field,  among  which:  smart  GPS  navigator,  robotic  guides  for  museum  or 
shopping  malls  or  robotic  wheel  chairs  for  disabled  persons.  Such  a  system  aims 
at  helping  its  user  to  reach  its  destination  in  a  fairly  complex  environment.  If  we 
assume  the  system  is  used  in  a  fixed  environment  by  multiple  users  for  multiple 
navigation  task  over  the  course  of  days  or  weeks,  then  it  is  possible  to  take  advantage 
of  the  user  routine:  from  the  initial  navigational  choice,  users  can  be  identified  and 
their  goal  can  be  predicted.  As  a  result  of  these  prediction,  the  guidance  system 
can  bring  its  user  to  its  destination  while  requiring  less  interaction.  This  property  is 
particularly  relevant  for  assisting  disabled  person  for  whom  interaction  is  a  long  and 
complex  task.  In  this  paper,  we  implement  a  user  guidance  system  using  a  dynamic 
Bayesian  model  and  a  topological  representation  of  the  environment.  This  model  is 
evaluated  with  respect  to  the  quality  of  its  action  prediction  in  a  scenario  involving 
4  human  users,  and  it  is  shown  that  in  addition  to  the  user  identity,  the  goals  and 
actions  of  the  user  are  accurately  predicted. 

1  Introduction 


Robots  are  more  and  more  present  in  the  daily  life,  not  only  in  the  industry  but  also 
at  home  as  toys  or  as  service  robots  such  as  vacuum  cleaners.  There  is  also  a  grow¬ 
ing  demand  in  the  health-care  domain  for  smart  assistive  device  such  as  intelligent 
wheelchairs.  The  present  paper  is  focused  on  an  intelligent  system  designed  to  help 
the  elderly  or  disabled  people  in  their  daily  activities.  For  these  people,  moving  in 
their  houses  or  passing  through  doorways  may  represent  challenging  tasks.  We  de¬ 
veloped  a  semi-autonomous  robot  for  improving  user  mobility  while  minimizing  the 
required  input,  i.e.  having  an  interaction  process  adapted  to  low  throughput  devices 
such  as  single  switches,  sip  and  puff  systems,  brain  machine  interfaces,  or  simple 
voice  recognition.  More  precisely,  at  each  crossing,  the  robot  proposes  a  direction  of 
travel  to  the  user  who  will  then  either  agree  or  disagree.  The  better  the  propositions 
are,  the  faster  and  easier  the  human-robot  interaction  is. 
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In  this  work,  a  dynamic  Bayesian  network  (DBN)  is  used  in  order  to  learn  the 
habits  of  multiple  human  users  of  a  robotic  helper.  By  habits ,  we  mean  the  suc¬ 
cession  of  navigational  tasks  one  executes  in  a  known  environment:  in  a  retirement 
home  for  instance,  one  resident  wakes  up  in  his  bedroom,  goes  to  the  bathroom,  then 
to  the  common  room  where  the  breakfast  is  served,  and  then  goes  to  other  rooms  for 
his  daily  activities.  Another  resident  will  have  other  preferred  locations  for  his  own 
activities.  All  these  accumulated  additional  information  allow  the  robotic  assistant 
to  help  its  current  human  user  from  the  first  movements  of  a  new  travel  until  the 
destination.  From  the  system’s  point  of  view,  the  actual  identity  of  its  user  is  not  a 
relevant  information.  A  user  is  merely  defined  by  his  activity  pattern. 

In  the  next  section,  we  describe  related  works  in  recognition  techniques  for  goal, 
user,  or  activity.  In  section  3,  we  detail  the  developed  DBN,  represented  as  a  graph¬ 
ical  model.  Section  4  presents  experiments  in  simulation  as  well  as  with  a  real  robot 
and  their  results.  The  final  discussion  appears  in  section  5. 

2  Related  Works 

Our  aim  is  to  ease  the  navigation  of  users,  therefore  we  will  consider  activity  recog¬ 
nition  only  from  the  point  of  view  of  navigation.  In  this  case,  inferring  the  user’s 
intention  requires  techniques  for  plan  recognition,  which  are  used  in  a  broad  variety 
of  domains,  such  as  motion  prediction,  speech  understanding,  video  surveillance, 
and  so  on.  Uncertainty  is  inherent  in  plan  inference,  as  the  robot  does  not  know  in 
advance  the  intended  destination  of  a  user.  Furthermore,  many  ways  can  lead  to  the 
same  destination  while  one  way  can  lead  to  several  places.  Probabilistic  reasoning 
techniques  are  used  in  almost  all  works  (review  in  [1]),  as  they  help  to  express  and 
maintain  the  beliefs  in  the  possible  goal  destinations. 

In  the  particular  domain  of  intention  recognition  for  navigation,  two  aspects  were 
studied  recently:  local  intention  recognition  (immediate  action  or  location  in  the 
vicinity  of  the  wheelchair  from  uncertain  input)  and  global  intention  recognition 
(goal  destination  from  local  decision).  Our  work  focuses  on  this  latter  issue,  assum¬ 
ing  that  the  recognition  of  the  immediate  intention  is  solved.  In  our  test,  this  will  be 
achieved  by  an  interaction  device  with  low  uncertainty,  such  as  a  joystick  or  a  reli¬ 
able  speech  recognition.  Inferring  a  global  intention  requires  the  ability  to  localize 
the  robot  on  an  available  map  of  the  environment.  In  a  discrete  environment  repre¬ 
sented  by  adjacent  cells,  Verma  and  Rao  [12]  described  a  Dynamic  Bayesian  Net¬ 
work  (DBN)  in  the  form  of  a  graphical  model  composed  of  a  Partially  Observable 
Markov  Decision  Process  (POMDP)  enhanced  with  the  notion  of  goal  locations.  Re¬ 
lying  on  reinforcement  learning  techniques,  the  agent  explores  how  it  should  behave 
in  order  to  reach  three  possible  goals.  Based  on  this  acquired  knowledge,  the  system 
could  infer  the  most  probable  goal  from  a  set  of  possibly  noisy  observations.  Taha  et 
al.  [10]  achieved  similar  results  by  fusing  the  robot  position  and  the  possible  goals 
as  the  new  state  definition  in  a  common  POMDP.  Vasquez  [11]  described  a  growing 
hidden  Markov  model  algorithm  dedicated  to  continuous  learning,  clustering,  and 
making  inference  about  car  motions  in  a  parking  lot  or  people  motions  in  a  hallway. 
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Topological  maps  were  successfully  used  in  the  works  of  Taha  [10].  This  envi¬ 
ronment  representation  is  compact  and  matches  the  human’s  natural  description  of 
a  path  better  than  metric  representations  [7]  (e.g.  “go  on  the  left  at  the  second  cross¬ 
ing”  instead  of  “go  straight  for  100  meters,  then  turn  left”).  Many  techniques  exist 
for  the  map  construction,  e.g.  the  generalized  Voronoi  diagram,  and  applications 
based  on  imprecise  human  drawings  have  also  been  reported  [9] . 

User  models  are  often  used  in  computer  applications  (e.g.  intelligent  help)  or 
online  search  or  e-commerce  sites  (e.g.  book  recommendation  based  on  other  cus¬ 
tomer’s  choices)  [8].  Based  on  databases  containing  information  from  numerous 
users  and  some  observations  from  the  current  user  activity,  reasoning  techniques  in¬ 
fer  the  next  user  action  and  try  to  help  him.  With  the  aging  population,  some  research 
domains  are  focused  on  the  activity  recognition  in  so-called  smart  environments,  i.e. 
environments  where  sensors  have  been  installed  in  order  to  monitor  the  activity  of 
human  beings.  The  accumulated  data  are  used  to  train  algorithms  which  later  serves 
to  determine  human  activities,  supervise  the  user’s  condition  and  medication,  or  de¬ 
tect  anomalies  [2,  4,  6,  13].  Some  researches  try  to  further  determine  the  attributes 
(coffee  drinker,  smoker)  of  multiple  users  based  on  their  location  [3,  5]. 

As  a  summary,  the  studies  on  activity  recognition  perform  well  in  capturing  the 
user  habits,  but  they  can  only  monitor  ongoing  activities,  not  help  the  user  to  per¬ 
form  a  task.  On  the  other  hand,  the  studies  on  intention  recognition  for  facilitating 
the  user’s  motion  do  not  try  to  learn  the  typical  user’s  daily  habits.  In  this  article, 
we  propose  a  system  that  learns  the  habits  of  multiple  human  users  controlling  a 
robot  and  exploit  this  knowledge  for  the  navigational  control  of  the  robot.  In  case 
of  an  unknown  user,  the  same  system  first  infers  the  identity  of  the  user  from  the 
first  movement  in  the  environment  before  being  able  to  exploit  this  knowledge  for 
helping  its  user  specifically.  The  robotic  system  is  given  a  topological  representa¬ 
tion  of  the  environment  a  priori.  While  incrementally  learning  the  global  intentions 
of  a  known  user,  it  anticipates  the  user’s  destination  and  proposes  better  actions,  as 
regular  patterns  of  actions  are  performed  day  after  day  by  the  same  user. 

3  Model  description 


The  graphical  model  shown  in  figure  1  represents  the  dynamic  Bayesian  network 
which  composes  the  core  of  the  system.  Our  DBN  is  an  extension  of  a  Markov  De¬ 
cision  Process  (MDP),  with  the  agent’s  State  St  and  the  Action  At  (lower  part).  The 
state  St  is  the  observed  pose  of  the  robot,  in  terms  of  position  in  a  given  topological 
map  (a  node)  and  orientation.  1  The  action  At  is  among  the  repertoire:  forward,  left, 
right,  u-turn,  and  stop.  An  action  always  connects  to  nodes  of  the  map.  The  MDP  is 
enhanced  by  the  notion  of  Goal  state  Gt ,  indicating  the  current  global  goal,  and  the 
variable  Goal  reached  Rt ,  a  Boolean  value  indicating  whether  the  agent’s  current 
state  St  is  a  goal  state  or  not  (P(Rt\Gt  St)  =  1  iff  Gt  =  St). 


1  In  relatively  small  and  not  too  dynamic  environments  as  the  ones  considered  for  this  work,  local¬ 
ization  can  be  considered  as  a  solved  issue. 
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Fig.  1  Our  dynamic  Bayesian  model  for  learning  the  habits  of  multiple  users. 


The  term  P(At\St  Gt  Rt)  is  specified  separately  according  to  whether  or  not  the 
goal  is  reached.  First,  when  Rt  =  0,  P(At\St  Gt)  is  the  action  model  leading  to  each 
goal  Gt  from  any  state  St  as  deduced  from  the  topological  map  (i.e  no  reinforcement 
learning  technique  used).  Then,  when  Rt  =  1,  the  probability  of  staying  still  P(At  = 
stop)  is  higher  than  the  other  actions,  so  that  the  user  can  easily  stop  when  it  reaches 
a  goal  location.  The  last  term  of  the  MDP,  the  transition  model  P(St\St~i  A*_i),  is 
also  deduced  from  the  topological  map. 

With  the  goal  of  learning  the  user  habits  for  multiple  users,  we  added  two  more 
variables.  First,  the  user  is  symbolized  with  the  variable  Ut.  Second,  the  variable 
Giast,  representing  the  last  visited  goal,  makes  possible  the  learning  of  the  succession 
of  place  visits,  as  will  be  shown  later.  Both  the  goal  model  P(Gt\Gt-i  Rt- 1  Ut~  i  G[ast ) 
and  the  user  model  P(Ut\Ut-\  Gt~ i  Giast)  result  from  submodels,  where  predefined 
terms  related  to  the  persistence  of  a  goal  (keeping  the  same  goal  or  switching  to 
another  one),  resp.  the  persistence  of  a  user,  are  combined  with  terms  learned  online 
representing  the  knowledge  acquired  by  the  system.  These  submodels  are  described 
in  details  in  the  appendix.  When  reaching  a  goal  during  an  online  supervised  learn¬ 
ing  phase,  the  known  user  and  the  particular  Gt  and  Giast  are  used  for  updating  the 
histograms  of  the  probability  distributions  of  the  goal  and  user  models. 

From  the  model  shown  in  figure  1 ,  we  compute  at  each  time  step  several  proba¬ 
bility  distributions.  After  the  learning  phase,  the  user  is  not  known  any  more.  Never¬ 
theless,  given  the  known  variables  St-uAt-\9  and  Giast,  we  can  infer  his  identity  by 
computing  P(Ut\St-i  At-\  Giast )  using  Bayes’  rule  and  the  law  of  total  probability: 

P(Ut\St-l  At-i  Giast)  (1) 

P(Gt- 1)  Y  [P(Ut- 1)  P(Ut\Ut-x  Gt- 1  Giast)] 
ut- 1 

X  Y  [P(Rt-i\Gt-i  St-\)P(At-\ \Gt-i  St- 1  Rt- 1)] 

Rt- i 

where  the  term  ^  corresponds  to  a  normalization  factor.  For  a  guidance  robot,  we 
also  want  to  infer  the  goal  the  robot  should  be  aiming  to  or  the  action  it  should 
propose  to  the  unknown  user.  These  two  distributions  can  be  computed  as  follows: 


P{Gt\St- 1  At- 1  Giast) 

^  1  P(Gt- 1)  Y 
Rt-i 


P(Rt-i\Gt-i  St-\)P(At-i \Gt-\  St- 1  Rt- 1) 
x  Y  {P(Ut-i)P(Gt\Gt-i  Rt- 1  Ut- 1  Glast )} 
ut- 1 


(2) 
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(a) 


(b) 


Fig.  2  (a)  The  differential  drive  robot  used  in  our  experiments,  (b)  Example  of  action  proposed  to 
the  user,  who  then  either  agrees  of  disagrees  orally  (use  of  a  speech  recognition  software). 


P(At\St-\  At-i  Giast) 

P{St\At^i  St-\) 

[P(Gm) 

/ 


(3) 


If  the  user  is  known  by  the  robot  (e.g.  through  manual  or  visual  identification), 
we  can  use  a  Dirac  distribution  for  P(Ut~ i)  and  recompute  all  the  above  equations 
with  this  additional  information,  the  system  giving  back  the  learned  user  habits. 


4  Experiments 

We  run  experiments  of  our  multi-user  guidance  robot  in  our  laboratory  environment, 
using  a  differential-drive  robot  (fig.  2a).  Figure  3  shows  that  the  topological  decom¬ 
position  of  the  environment  is  made  of  goal  nodes  and  connecting  nodes.  The  former 
ones  represent  either  people’s  desks  (D1-D6)  or  common  rooms  like  the  cafeteria 
(C),  the  printer  room  (P),  the  robot  lab  (RL),  or  the  bathroom  (B).  Four  users  share 
the  robot,  each  with  a  particular  desk  and  typical  sequence,  all  starting  from  the 
entrance  E.  User  1  executes  the  sequence  D3-D4-P-C-D3-D2-D3,  user  2  D4-RL-C- 
D3-D4-RL-D4,  user  3  D2-C-D2-D3-D2-P-D2,  and  user  4  D6-RL-C-D4-D6-RL-D6 
(fig.  4a-d).  Additionally  each  user  goes  to  the  bathroom  at  a  random  point  in  his  se¬ 
quence.  These  sequences  are  repeated  20  times  during  the  learning  phase.  In  order 
to  speed  up  this  process,  we  used  a  simulation.  However,  for  the  tests  on  the  21th 
day,  a  real  robot  was  used.  2  In  a  second  phase,  the  real  robot  was  used  by  each  of 
the  user,  the  robot  having  no  preliminary  knowledge  about  the  current  user.  At  each 
time  step,  the  system  computes  the  probability  distribution  over  the  user,  the  goal, 
and  the  action  with,  respectively,  equations  1,  2,  and  3.  This  last  distribution  is  used 


2  In  simulation,  the  robot  moves  instantaneously  to  the  next  node,  the  inputs  to  our  DBN  St-\ , 
At- 1,  and  Giast  being  identical  to  when  using  the  real  robot. 
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Fig.  3  Topological  map  of  our  laboratory  used  for  the  experiments  with  an  entrance  (E),  desks 
(D1-D6),  a  printer  room  (P),  a  cafeteria  (C),  a  robot  lab  (RL),  and  toilets  (T). 


Fig.  4  Habit  description  of  the  four  users.  The  succession  of  visited  places  is  marked  with  increas¬ 
ing  numbers.  Each  user  visits  the  bathroom  (B)  at  random  among  the  sequence. 


to  propose  an  action  to  the  user  (see  fig.  2b),  who  can  confirm  or  not.  In  this  precise 
experiment,  this  was  achieved  through  a  speech  recognition  system.  3 

3  The  software  performed  well  enough  to  assume  a  perfect  recognition  of  the  user  agreement 
(“Yes”)  or  disagreement  (“No”). 
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Fig.  5  User  probability  when  (a)  user  1,  (b)  user  2,  (c)  user  3,  (d)  user  4  was  steering  the  robot. 
The  dotted  lines  indicate  the  steps  where  a  user  stopped  at  a  goal. 


Results  of  the  user  recognition  are  shown  in  figure  5,  which  present  the  evolution 
of  the  user  probability  distribution  during  a  trial  of  the  test  phase.  Starting  from 
a  uniform  distribution,  the  actual  user  is  correctly  inferred  as  soon  as  a  revealing 
action,  mostly  leading  towards  his  desk,  is  executed.  In  the  scenarios,  the  different 
user  habits  shared  some  common  goals,  or  even  some  common  sequences  of  goals. 
These  similarities  can  be  seen  in  figure  5,  where  the  probability  in  the  most  probable 
user  decreases  and  the  one(s)  for  other  user(s)  increases  (e.g.  fig  5b,  the  sequence 
between  goals  D3  &  D4  is  shared  by  users  1  &  2).  We  can  also  notice  that  when 
a  user  goes  to  the  bathroom,  his  inferred  probability  decreases  slightly.  As  every 
user  can  go  to  the  bathroom  at  any  time,  this  is  not  a  discriminant  observation,  the 
models  tending  thus  slightly  to  a  uniform  due  to  the  transition  in  the  user  submodel. 

For  quantifying  our  model,  we  compare  the  model  prediction  of  the  user,  the 
goal,  and  the  action  with  the  real  values.  Based  on  their  distributions  computed  at 
each  time  step,  we  can  check  if  its  maximum  is  the  actual  user,  intended  goal  and 
proper  action  proposition.  We  thus  introduce  a  measure  jJ,max,  being  the  mean  of  the 
number  of  times  the  max  of  a  distribution  matches  the  real  value.  For  example,  the 
formula  for  the  goal  is  ^  L=i  s (A ,  argmaxGf  P(Gt\St- 1  Af_i  Giast ) )  ,  with 

N  the  number  of  steps,  Gt  the  intended  goal,  and  8(a,b)  the  Kronecker  function. 
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l^max 

Standard  usage 

User  1  User  2  User  3  User  4 

Robot  taken  from  user  2 
User  3  User  4 

User 

0.89 

0.91  0.94 

0.96 

0.76 

0.56 

Goal 

0.64 

0.63  0.57 

0.84 

0.54 

0.52 

Action 

0.94 

0.93  0.94 

0.96 

0.89 

0.92 

Table  1  Model  performance  for  the  user,  goal,  and  action  predictions  of  each  of  the  four  single 
user  scenarios  (left  columns)  and  the  two  taking-over  scenarios  (right  columns). 


The  results  are  displayed  in  table  1 .  As  can  be  seen  when  comparing  with  figure  5, 
the  user  are  properly  recognized,  the  differences  being  explained  by  the  amount  of 
steps  until  a  revealing  action.  Concerning  the  goals  and  the  actions,  their  proba¬ 
bilities  are  related  to  the  learned  habits  of  the  users  and  are  also  influenced  by  the 
geographical  location  of  the  visited  goals.  As  an  example,  user  3  visited  three  goals 
on  the  right  when  leaving  his  office,  while  the  fourth  one  was  on  the  left.  Accord¬ 
ingly,  the  probability  of  going  right,  resp.  of  the  three  right  goals  were  higher  than 
going  left  to  the  bathroom.  Furthermore,  we  can  notice  that  the  action  inference  is 
much  better  than  the  goal  inference.  Indeed,  most  of  the  time,  the  goal  confusion 
is  between  two  nearby  goals  that  share  a  part  of  the  sequence.  As  a  consequence, 
the  goal  could  be  erroneously  inferred  during  the  first  steps  of  the  travel,  while  the 
action  inference  will  be  wrong  only  at  a  branching  node. 

Finally,  we  want  to  assess  the  robustness  of  our  system.  To  this  end,  we  run 
scenarios  where  two  users  took  over  the  robot  when  user  2  reaches  the  cafeteria  (C). 
First,  the  results  with  user  3  are  displayed  in  figure  6a.  As  can  be  seen,  while  leaving 
C,  the  robot  thought  user  2  was  going  to  D3,  but  also  increased  the  probability  in 
user  1,3,  and  4  as  they  all  go  in  that  direction  after  being  in  C.  At  each  revealing 
action,  the  probability  in  specific  users  changes.  But  when  reaching  D2,  user  3  starts 
being  the  most  probable  one.  The  increasing  probability  of  user  1  is  explained  by 
the  shared  common  goals  with  user  3,  and  starts  to  decline  as  soon  as  user  3  comes 
back  to  D2.  In  figure  6b,  the  results  when  user  4  takes  the  robot  from  user  2  are 
displayed.  As  user  4  first  goes  to  the  desk  of  user  2,  this  sequence  is  not  revealing 
enough  because  only  the  former  action  differs  from  the  habit  of  user  2.  Then,  going 
to  the  toilets  does  not  bring  any  information  about  the  user’s  identity.  Finally,  when 
user  4  reaches  his  desk,  the  remaining  sequences  are  specific  enough  for  the  system 
to  infer  its  user  correctly.  Overall,  despite  a  performance  loss  in  both  goal  and  user 
recognition,  table  1  shows  that  the  actions  proposed  to  the  user  are  still  relevant. 

5  Conclusion 

In  this  paper,  we  introduced  a  probabilistic  model  able  to  learn  the  daily  habits  of 
different  users  of  a  guidance  robot.  When  the  robot  accumulated  enough  knowledge 
about  a  particular  user,  it  is  able  to  actively  help  its  user  to  reach  a  goal  destination. 
As  shown  in  the  experiments  starting  with  an  unknown  user,  the  same  model  is  able 
to  infer  the  identity  of  the  user  after  just  a  few  revealing  actions  have  been  made,  or 
specific  goals  have  been  reached.  Having  then  a  high  confidence  in  a  particular  user, 
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Fig.  6  Scenarios  where  (a)  user  3  and  (b)  user  4  take  the  robot  from  user  2  in  the  cafeteria.  The 
user  probability  is  displayed.  The  dotted  lines  indicate  the  steps  where  a  user  stops  at  a  goal.  The 
black  region  indicates  when  a  new  user  is  steering  the  robot. 


the  system  is  again  able  to  actively  help  its  user.  Furthermore,  experiments  where 
a  different  user  takes  over  the  guidance  robot  from  an  initial  user  have  shown  the 
ability  of  the  system  to  recognize  the  switch  between  the  users. 

We  tested  our  system  with  a  reasonable  environment  size  and  user  number,  but 
it  would  be  interesting  to  test  its  robustness  to  an  increased  user  number  or  to  com- 
plexer  user  patterns.  If  leading  to  pattern  overlaps,  the  system  should  still  be  able 
to  help  the  users  as  they  would  go  to  the  same  location.  However,  the  extreme  case 
would  be  if  there  is  a  uniform  distribution  on  the  goals  when  leaving  a  particular  one 
(e.g.  the  bathroom).  For  these  cases,  discretizing  the  user  sequences  over  different 
periods  of  the  day  or  increasing  the  history  length  should  help. 
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Appendix:  Submodels  for  the  goal  and  user  models 


In  this  appendix,  we  first  describe  the  submodel  used  for  building  the  goal  model  in  the  main 
system,  then  the  submodel  for  the  user  model.  In  the  goal  model  P(Gt\Gt-\  Rt-\  Ut-\  Giast), 
the  goal  at  time  t  is  dependent  from  a  lot  of  variables  issued  from  the  previous  time  step.  Some 
relations,  e.g.  between  Gt  and  Giast,  are  user-dependent  and  some  others,  e.g.  between  Gt  and  Rt, 
can  be  predefined.  A  submodel  is  a  Bayesian  model  on  his  own,  with  specific  relation  among  the 
variables,  which  is  then  used  in  order  to  infer  the  probability  distribution  over  some  variables  given 
some  other  known  variables.  In  the  case  of  the  goal  model,  our  submodel  is  defined  as  follows: 


P(G,  Gt-\  R,-i  Ut- 1  Giast)  =  P(G,- 1  Rt- 1)  P{G,\G,-i  Rt- 1)  P(U,- 1  Glas,\G,) 
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We  have  P{Gt~  1  Rt- 1)  which  is  a  uniform  distribution.  Then,  P(Gt\Gt-\  Rt~  1)  defines  the 
persistence  of  a  goal:  if  a  goal  is  not  reached,  the  probability  of  having  the  same  goal  in  mind 
is  much  higher  than  changing,  whereas  when  the  user  reaches  a  goal,  the  other  places  become 
equally  probable  with  a  preference  of  staying  in  the  current  place.  This  term  is  predefined. 
Finally,  the  P{Ut~\  Giast\Gt)  distribution  is  derived  from  a  histogram  learned  by  the  system 
when  a  user  stops  at  a  goal  location.  From  this  model,  we  compute  the  probability  distribution 
P(Gt\Gt-i  Rt- 1  Ut- 1  Giast),  which  is  then  used  in  the  main  model  as  the  goal  model,  as  follows: 

P(G,\G,-i  R,- 1  Ut- 1  Glast )  -  P(G,\G,-i  Glast\Gt) 

The  user  model  is  build  in  a  similar  manner  than  the  goal  model  and  is  the  following: 

P(Ut  ut- 1  Gt-i  Glast)  =  P{Ut- 1)  P(Ut\Ut-i)  P(Gt- 1  Giast\Ut) 

Again,  P(Ut-\)  is  a  uniform  distribution.  P(Ut\Ut-\ )  describes  the  persistence  of  the  human 
user,  i.e.  keeping  the  same  user  is  more  probable  than  switching  between  users.  P(Gt-\  Giast\Ut ) 
is  also  derived  from  a  histogram,  which  is  also  learned  when  a  user  reaches  a  goal  location.  The 
probability  distribution  computed  in  order  to  build  the  user  model  is  P(Ut\Ut-\  Gt  \  Giast ): 

P(Ut\Ut-i  Gt-1  Glast)  ~  P(Ut\Ut-i)P(Gt-i  Giast\Ut) 
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Applied  Imitation  Learning  for  Autonomous 
Navigation  in  Complex  Natural  Terrain 


David  Silver,  J.  Andrew  Bagnell,  Anthony  Stentz 
Robotics  Institute,  Carnegie  Mellon  University 


Abstract  Rough  terrain  autonomous  navigation  continues  to  pose  a  challenge  to 
the  robotics  community.  Robust  navigation  by  a  mobile  robot  depends  not  only  on 
the  individual  performance  of  perception  and  planning  systems,  but  on  how  well 
these  systems  are  coupled.  When  traversing  rough  terrain,  this  coupling  (in  the  form 
of  a  cost  function)  has  a  large  impact  on  robot  performance,  necessitating  a  ro¬ 
bust  design.  This  paper  explores  the  application  of  Imitation  Learning  to  this  task 
for  the  Crusher  autonomous  navigation  platform.  Using  expert  examples  of  proper 
navigation  behavior,  mappings  from  both  online  and  offline  perceptual  data  to  plan¬ 
ning  costs  are  learned.  Challenges  in  adapting  existing  techniques  to  complex  online 
planning  systems  are  addressed,  along  with  additional  practical  considerations.  The 
benefits  to  autonomous  performance  of  this  approach  are  examined,  as  well  as  the 
decrease  in  necessary  designer  interaction.  Experimental  results  are  presented  from 
autonomous  traverses  through  complex  natural  terrains. 


1  Introduction 

The  capability  of  autonomous  robotic  systems  to  successfully  navigate  through  un¬ 
structured  environments  continues  to  advance.  Ever  improving  high  resolution  sen¬ 
sors  and  perception  algorithms  allow  a  mobile  robot  to  build  a  detailed  model  of 
its  environment,  and  advances  in  planning  systems  allow  for  the  generation  of  ever 
more  complex  routes  and  trajectories  towards  achieving  a  navigation  goal.  However, 
as  perception  and  planning  systems  become  more  complex,  so  does  the  task  of  cou¬ 
pling  these  systems.  This  coupling  often  takes  the  form  of  a  Cost  Function.  Using 
data  from  the  perception  system  as  input,  the  cost  function  maps  to  a  scalar  cost 
value,  defined  over  the  state  space  of  the  planning  system  (Figure  1).  These  costs 
are  then  used  as  the  optimization  metric  by  the  planning  system  when  determining 
the  next  action  or  sequence  of  actions. 
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Fig.  1  Crusher  (Left)  is  capable  of  autonomous  navigation  through  complex  outdoor  terrain.  Per¬ 
ceptual  data  (Top  Right)  are  converted  to  costs  (Bottom  Right)  for  use  by  the  planning  system. 

In  simple  or  structured  environments,  cost  functions  are  often  easily  defined;  for 
instance,  in  an  indoor  environment  the  cost  of  traversable  freespace  should  be  very 
low,  and  the  cost  of  walls  or  other  obstacles  should  be  high.  However,  in  rough  or 
unstructured  terrain,  it  is  less  intuitive  how  to  define  cost.  A  small  obstacle  should 
clearly  have  larger  cost  than  flat  ground,  and  smaller  cost  than  a  large  obstacle. 
Explicitly  defining  these  tradeoffs  encodes  the  desired  behavior  of  the  robot  and 
is  quite  challenging;  defining  a  generalizeable  function  that  maps  from  perceptual 
inputs  to  the  proper  cost  is  even  more  so. 

This  first  step  of  defining  the  relative  cost  of  various  terrains  requires  a  concrete 
definition  of  what  metric  a  robot’s  performance  will  be  measured  against.  Com¬ 
mon  metrics  include  maximizing  safety  or  probability  of  success,  minimizing  dis¬ 
tance  traveled  or  time  taken,  minimizing  net  energy  loss,  minimizing  observability 
or  maximizing  sensor  coverage.  Often,  the  actual  desired  robot  behavior  optimizes 
a  combination  of  such  metrics;  for  example,  it  may  be  desirable  for  a  robot  to  ap¬ 
proximately  maximize  safety  but  take  certain  risks  to  minimize  distance  traveled. 

Previous  work  has  focused  on  several  differing  approaches.  Attempts  to  explic¬ 
itly  approximate  traversability  through  simulation  [3]  or  proprioception  [5]  limit  the 
choice  of  metrics  to  maximizing  safety;  they  also  require  a  robot  model  capable 
of  directly  computing  probabilities  of  mobility  failure.  Approaches  focused  on  ex¬ 
plicitly  combining  multiple  metrics  are  limited  to  optimizing  one  metric  subject  to 
constraints  on  others  [13,  15].  The  most  common  general  solution  is  to  manually 
design  and  hand  tune  a  cost  function  until  the  robot  achieves  the  desired  behavior. 
This  can  be  an  incredibly  tedious  process  (as  it  requires  a  manual  optimization  in 
a  potentially  high  dimensional  space)  and  often  results  in  systems  that  suffer  from 
poor  generalization  and  a  lack  of  robustness  to  novel  scenarios. 

This  paper  explores  the  application  of  imitation  learning  to  this  challenge,  specif¬ 
ically  the  LEARCH  [10]  algorithm  described  in  the  next  section.  The  application 
of  this  approach  to  the  Crusher  autonomous  navigation  platform  [14]  (Figure  1)  is 
reviewed,  along  with  a  discussion  of  practical  considerations  when  applying  this 
approach. 
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Co  =  Prior ; 

for  i  =  1...T  do 
foreach  Pe  do 

foreachx  e  getBoundingBox  (Pe)  do 

F[x]  =  getPerceptionFeatures  (jc)  ; 

M[x]  =  Q- 1  (F  [x] )  +  Le  (x) ; 

P*  =  planPath  ( se,geiM )  ; 

P*  =  replanExamplePath  (A,/3,M)  ; 

{£/+%!/?}  =  computeVisitationCounts  (P* ,P* )  ; 
Ri  =  trainBalancedRegressor  (F,  U+,  U_ )  ; 

Q  =  Q-i  *er,iRi; 

Fig.  2  The  LEARCH  algorithm 


2  Imitation  Learning 

Although  explicitly  defining  the  relative  tradeoffs  between  different  actions  is  a  dif¬ 
ficult  task  for  a  domain  expert,  indicating  or  demonstrating  examples  of  correct  be¬ 
havior  is  often  easier  (otherwise,  the  task  itself  is  not  well  defined).  Therefore,  the 
imitation  learning  framework  seeks  to  learn  the  correct  robot  behavior  from  obser¬ 
vation  of  expert  behavior.  Many  previous  applications  of  this  framework  to  mobile 
robots  [6,  7]  sought  to  learn  to  predict  what  action  to  perform,  based  on  the  action 
performed  by  an  expert  at  a  certain  state.  In  this  way,  action  prediction  essentially 
replaces  the  lower  level  motion  planning  operation  on  a  mobile  robot.  However,  this 
approach  is  inherently  myopic  and  does  not  scale  to  longer  range  planning,  as  it 
requires  all  necessary  information  to  be  encoded  in  the  current  robot  state. 

Therefore,  rather  than  learn  a  mapping  from  features  of  a  state  to  actions,  we  seek 
to  learn  a  mapping  from  features  of  a  state  to  costs,  such  that  the  planning  system 
will  produce  the  correct  behavior  when  provided  with  said  costs.  This  approach  has 
its  roots  in  the  concept  of  Inverse  Optimal  Control ,  and  has  recently  been  developed 
for  use  in  robotic  systems  [1,  8].  By  learning  a  cost  function  to  reproduce  expert 
behavior,  the  need  for  explictly  defining  a  metric  or  weighting  between  metrics  is 
eliminated;  the  new  metric  is  matching  human  performance,  and  it  is  left  up  to 
the  human  expert  to  define  (through  behavior)  how  to  balance  various  options  and 
considerations. 

It  is  important  to  note  that  this  approach  learns  the  correct  cost  function  for  a 
specific  planner  or  system  of  planners,  and  maps  to  cost  from  a  specific  perception 
system.  The  purpose  is  not  to  try  and  improve  the  separate  performance  of  these 
systems;  rather,  it  is  to  optimize  the  coupling  of  these  modules  to  provide  the  best 
overall  system  performance.  As  the  Crusher  system  operates  with  costs  defined  over 
a  2D  grid,  subsequent  descriptions  will  deal  specifically  with  this  setting.  Without 
loss  of  generality,  it  is  easiest  for  now  to  consider  the  planning  system  as  a  basic 
A*  planner,  and  a  planned  path  as  a  sequence  of  2D  grid  cells.  Adaptation  to  more 
complex  planning  systems  is  covered  in  the  next  section. 
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Our  imitation  learning  approach  is  based  on  the  LEARCH  algorithm,  (Figure  2); 
for  a  full  derivation,  see  [10].  The  input  to  the  algorithm  is  a  set  of  example  paths, 
each  a  sequence  of  2D  locations  leading  from  a  start  s  to  a  goal  g  and  representing 
the  correct  path  (according  to  the  expert).  LEARCH  seeks  to  find  a  cost  function  C 
such  that  each  example  path  Pe  is  the  planner  output  under  the  cost  function.  The 
LEARCH  inner  loop  iterates  through  each  example.  For  each  example  path  Pe  with 
start  and  goal  se  and  ge  a  path  P*  is  planned  under  the  current  candidate  cost  function 
Q  (Co  can  be  initialized  to  any  prior).  Since  R*  is  the  output  of  an  optimal  planner, 
Ci{p, o  <  Ci(Pe).  Since  we  desire  a  C  such  that  C(Pe)  =  C(P*),  we  seek  to  minimize 
the  difference  in  cost  C(Pe)  —  C(P*).  As  the  cost  of  a  path  is  simply  the  sum  of  costs 
at  states  along  it,  Pe  and  R*  provide  a  list  of  states  where  the  cost  could  be  changed 
to  lower  this  cost  difference:  states  in  Pe  could  have  their  cost  lowered  and  states  in 
P*  could  have  their  cost  raised  (states  in  both  simply  cancel). 

This  list  of  candidate  states  (called  the  visitation  counts)  provides  a  local  gradi¬ 
ent  in  the  space  of  cost  functions.  However,  the  cost  at  each  state  can  not  simply 
be  raised  or  lowered,  as  the  goal  is  not  to  identify  the  correct  cost  for  each  cell,  but 
rather  a  function  that  maps  perceptual  features  to  an  appropriate  cost.  If  a  function 
AQ  could  be  found  that  approximated  this  gradient  (the  output  is  positive  or  nega¬ 
tive  when  provided  with  the  appropriate  features),  adding  it  to  Q  would  lower  the 
cost  difference. 

Finding  a  general  function  to  match  a  list  of  input/output  pairs  can  be  solved 
through  regression  analysis.  In  this  case,  the  inputs  are  well  defined  (perceptual 
features),  but  the  outputs  are  not;  for  each  input,  the  required  cost  delta  is  not  known, 
just  its  sign.  Therefore,  outputs  targets  are  specified  as  ±1  depending  on  whether 
the  costs  need  to  be  raised  or  lowered.  In  this  way,  the  regressor  R  generalizes  the 
local  cost  changes  over  the  entire  feature  space.  Each  regression  target  can  also  be 
weighted  to  indicate  that  certain  cost  changes  are  more  important  relative  to  others. 
Determining  these  relative  weights  is  discussed  in  Section  3. 

The  final  learning  procedure  is  summarized  as  follows:  for  each  example  path 
and  the  corresponding  plan  (under  Q),  compute  the  set  of  visitation  counts  and  the 
corresponding  perceptual  features.  Next  train  a  regressor  R  over  these  input/output 
pairs,  and  combine  it  with  Q,  weighted  by  a  learning  rate  parameter  tj.  The  cost  of 
cells  along  an  example  path  (which  an  expert  specifically  chose  to  encounter)  will 
generally  be  lowered,  while  the  cost  of  cells  along  a  temporarily  cheaper  path  (which 
the  expert  chose  to  avoid)  will  generally  be  increased.  This  loop  is  then  iterated  until 
convergence.  Figure  3  provides  a  visual  example  of  this  procedure  in  action. 

A  few  details  remain.  Rather  than  summation,  we  use  an  update  rule  of  Q+i  (/)  = 
Ci(f)er,Ri^\  resulting  in  exponentiated  functional  gradient  descent  [10]  which 
makes  better  use  of  available  dynamic  range,  as  well  as  naturally  enforcing  a  posi¬ 
tivity  constraint  on  costs.  The  choice  of  regressor  (e.g.  linear,  neural  net,  etc.)  is  also 
unspecified.  This  decision  helps  define  the  balance  between  descriptiveness  and  gen¬ 
eralization  in  the  space  of  possible  cost  functions,  and  is  discussed  further  in  Section 
3.  Finally,  by  augmenting  costs  with  a  margin  (determined  by  a  loss  function  Le(x)), 
trivial  cost  solutions  can  be  eliminated  and  generalization  improved. 
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Fig.  3  An  example  of  the  LEARCH  algorithm  learning  to  interpret  satellite  imagery  (Top)  as  costs 
(Bottom).  As  the  cost  function  evolves  (left  to  right),  the  current  plan  (green)  recreates  more  and 
more  of  the  example  plan  (red).  Quickbird  imagery  courtesy  of  Digital  Globe,  Inc. 


In  addition  to  the  algorithm  as  described,  there  are  a  few  modifications  that  can 
increase  robustness  to  noisy  or  imperfect  expert  demonstration  [11,  12].  Human  ex¬ 
perts  rarely  demonstrate  exact  optimal  behaviors,  especially  in  large  areas  of  similar 
terrain.  By  performing  a  balanced  regression  (normalizing  relative  weights  such  that 
positive  and  negative  targets  have  equal  total  weight),  the  regressor  can  be  forced 
to  more  strictly  separate  between  terrains  when  changing  cost.  Another  addition  to 
the  algorithm  is  to  continually  replan  the  example  path.  Human  demonstration  often 
contains  a  degree  of  high  frequency  noise;  a  smoothing  operation  is  therefore  bene¬ 
ficial.  This  smoothing  can  be  performed  by  selecting  a  new  example  that  is  entirely 
contained  within  a  corridor  of  width  j 3  around  the  original  example.  The  new  exam¬ 
ple  is  created  by  planning  the  optimal  path  using  the  current  cost  function  within  the 
corridor  (and  infinite  cost  elsewhere).  This  has  the  effect  of  adapting  the  example 
to  the  current  cost  hypothesis  at  a  small  scale  (implicitly  defined  by  /3),  while  still 
adapting  the  hypothesis  to  the  example  at  a  large  scale. 


3  Application  to  Autonomous  Navigation 

Our  imitation  learning  approach  was  applied  to  the  task  of  interpreting  perceptual 
data  for  the  purpose  of  motion  planning  on  the  Crusher  system.  Crusher  is  provided 
with  two  main  forms  of  perceptual  data:  static  sources  of  prior  data  (overhead  im¬ 
agery  and  LiDAR),  and  dynamic  sources  of  data  collected  in  real  time  (onboard 
cameras  and  LiDAR).  Once  these  data  sources  have  been  converted  to  2D  cost  grids 
and  fused  together  (at  the  cost  level),  Crusher’s  motion  planning  system  is  respon¬ 
sible  for  choosing  vehicle  actions.  The  motion  planning  system  is  similar  to  [4], 
and  combines  a  global  planner  based  on  Field  D*  [2]  and  a  local  planner  based 
on  forward  simulation  of  potential  vehicle  actions  (specifically  constant  curvature 
commands)  for  a  fixed  horizon. 

For  the  Crusher  platform,  imitation  learning  was  first  applied  to  the  task  of  inter¬ 
preting  overhead  data  to  create  prior  cost  maps  [1 1].  2D  feature  maps  are  extracted 
from  the  input  data  sources,  and  then  converted  from  maps  of  features  to  a  map  of 
costs.  These  maps  are  then  used  for  global  route  planning  offline,  as  well  as  online 
global  planning  when  fused  with  current  perceptual  data.  This  context  provides  an 
ideal  setting  for  the  application  of  the  LEARCH  algorithm  due  to  the  static  nature 
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of  the  perceptual  data,  and  the  ease  of  collecting  training  examples:  each  example 
is  simply  a  path  that  can  be  ’drawn’  by  an  expert  on  top  of  imagery  or  other  visual¬ 
ization  of  the  underlying  data. 

Since  overhead  costs  are  only  used  for  planning  in  regions  that  have  not  yet 
been  directly  observed  by  the  robot,  overhead  costs  are  learned  with  respect  to  the 
Field  D*  global  planner.  As  Field  D*  plans  an  interpolated  path  over  a  2D  grid, 
computing  visitation  counts  is  not  as  straightforward  as  simply  marking  which  states 
each  path  traverses  through.  Instead,  the  distance  traveled  through  each  grid  cell 
must  be  recorded.  This  results  in  visitation  counts  that  are  real  valued  instead  of 
binary.  During  regression  the  output  target  for  a  real  valued  visitation  count  is  still 
d=l,  but  the  target  is  now  weighted  relative  to  the  visitation  count.  If  a  path  passes 
through  cell  x\  for  twice  as  long  as  X2,  then  x\  has  twice  the  impact  on  the  cost  of  a 
path;  moving  the  cost  in  the  right  direction  is  therefore  twice  as  important. 

Like  many  motion  planning  algorithms,  Field  D*  also  makes  use  of  a  configura¬ 
tion  space  expansion  to  account  for  the  dimensions  of  the  vehicle.  A  configuration 
space  expansion  also  results  in  non-binary  visitation  counts;  it  is  taken  into  account 
by  incrementing  the  count  of  all  states  xj  relative  to  their  contribution  to  the  cost 
of  state  Xi  when  Xi  is  on  a  path.  Crusher’s  planning  system  performs  an  expansion 
by  averaging  costs  over  a  circular  window.  Therefore,  for  a  path  traveling  distance 
d  through  cell  jc/,  all  cells  xj  within  the  expansion  window  have  their  counts  incre¬ 
mented  by  d.  More  complex  expansions  can  be  accounted  for  in  the  same  manner. 

Imitation  learning  was  next  used  to  learn  costs  from  features  generated  by 
Crusher’s  onboard  perception  system.  Crusher’s  perception  software  processes  raw 
sensor  data  into  feature  descriptions  of  voxels  in  real  time;  each  column  of  voxels 
is  then  converted  into  a  2D  cost.  Therefore,  unlike  learning  from  overhead  data, 
features  are  not  static.  While  additional  adaption  of  the  LEARCH  algorithm  is  re¬ 
quired  in  order  to  deal  with  the  dynamic  and  unknown  nature  of  real  time  perceptual 
data,  the  approach  remains  conceptually  similar  and  will  be  treated  as  such  moving 
forward;  for  details  see  [12]. 

Instead  of  drawing  a  path  on  top  of  a  visualization,  collecting  expert  examples  to 
train  the  perception  system  consists  of  the  expert  manually  driving  Crusher  through 
an  example  behavior.  Along  with  the  path  traversed,  all  raw  sensor  data  is  logged 
during  this  collection.  Sensor  data  is  then  post-processed  via  playback  through 
Crusher’s  perception  software  to  generate  the  final  features  that  will  be  converted 
into  costs.  By  logging  the  raw  sensor  data,  perception  software  does  not  need  to 
remain  static  after  training  data  collection.  Whenever  changes  or  improvements  are 
made  to  perception  software,  features  can  simply  be  regenerated,  and  a  cost  function 
relearned.  Therefore,  training  data  in  this  form  does  not  need  to  be  recollected  every 
time  the  system  changes;  the  cost  function  is  simply  retrained  offline. 

Since  costs  from  online  perceptual  data  determine  Crusher’s  actual  motion  com¬ 
mands,  costs  must  be  learned  with  respect  to  the  local  planning  system.  Figure  4(a) 
provides  a  simplified  example  to  demonstrate  why  this  is  so.  If  costs  were  trained 
with  respect  to  the  global  planner,  LEARCH  would  be  satisfied  with  the  cost  on  the 
obstacle  O  when  it  is  sufficiently  high  to  make  up  for  the  extra  distance  \Pg  \  —  |PC|. 
However,  since  I^U^|>|Pc|  +  O  >  \Pg\,Pc  remains  the  cheapest  local  planner  op- 
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Fig.  4  (a)  A  simplified  scenario  where  the  local  planner  has  only  3  possible  actions,  (b)  Example 
of  a  new  feature  (right)  learned  from  panchromatic  imagery  (left). 

tion  in  this  case.  The  result  would  be  that  the  local  planner  would  still  choose  to 
drive  over  the  obstacle.  This  result  is  observed  empirically  in  Section  4. 

Unfortunately,  there  are  also  problems  with  training  directly  for  the  local  planner. 
As  the  local  planner  only  considers  a  discrete,  kinematically  feasible  set  of  actions,  it 
is  often  the  case  that  no  series  of  actions  will  sufficiently  match  the  expert  example. 
In  this  case,  the  LEARCH  termination  condition  is  undefined.  Terminating  when 
the  example  path  is  lower  cost  than  the  planned  path  will  not  suffice;  in  Figure  4(a) 
this  could  result  in  C(Pe)  <C(Pc)  <  C(P/),C(Pr)  (the  colliding  action  would  still  be 
preferred).  Running  until  the  cost  function  converges  is  therefore  necessary,  but  has 
its  own  side  affects.  Once  C(Pc)  >  C(P/),C(Pr),  LEARCH  will  start  to  try  and  raise 
the  cost  along  P[  or  Pr.  If  the  chosen  regressor  can  differentiate  between  the  terrain 
under  P[  or  Pr  and  that  under  Pe ,  it  will  raise  those  costs  without  proper  cause.  The 
end  result  is  a  potential  addition  of  noise  to  the  final  costs,  and  lower  generalization. 
The  degree  of  this  noise  depends  on  the  resolution  of  the  planner  and  the  regressor. 

If  there  were  some  way  to  know  the  ’right’  planner  action,  then  the  selection 
of  that  action  could  serve  as  a  termination  condition.  Collecting  this  information 
during  demonstration  by  an  expert  would  be  extremely  tedious,  requiring  an  ex¬ 
pert  selection  at  every  planning  cycle.  Instead,  we  propose  the  use  of  a  heuristic 
approach  to  approximate  this  decision.  Essentially,  we  seek  to  ’project’  the  expert’s 
example  behavior  onto  the  space  of  possible  planner  actions.  This  is  performed  by 
first  learning  a  perception  cost  function  for  the  global  planner.  As  described  above, 
such  a  cost  function  will  generally  underestimate  the  cost  necessary  for  the  local 
planner.  Therefore,  we  score  each  local  planner  action  by  its  average  cost  instead  of 
total  cost1 .  An  action  with  low  average  cost  can  not  be  said  to  be  optimal,  but  it  at 
least  traverses  desirable(low  cost)  terrain.  An  additional  distance  penalty  is  added  to 
bias  action  scores  towards  those  that  make  progress  towards  the  goal2.  After  scoring 
each  action,  that  with  the  lowest  score  is  used  as  the  new  example.  The  result  of  this 
initial  replanning  step  is  to  produce  a  new  example  behavior  that  is  feasible  to  the 
local  planner. 


1  The  global  planner  section  of  each  action  is  still  computed  with  respect  to  total  cost 

2  the  weight  of  this  penalty  can  be  automatically  tuned  by  optimizing  performance  on  a  validation 
set,  without  any  hand  tuning 


David  Silver,  J.  Andrew  Bagnell,  Anthony  Stentz 


When  dealing  with  static  overhead  data  a  cost  function,  once  learned,  will  gener¬ 
ally  only  be  applied  once  through  a  data  set.  In  contrast,  a  perception  cost  function 
will  be  continually  applied  in  real  time  on  a  robot  when  operating  autonomously. 
Therefore,  it  is  important  that  the  cost  function  be  computationally  inexpensive.  As 
combining  multiple  linear  functions  yields  a  single  linear  function,  linear  regressors 
have  a  significant  computational  advantage  over  nonlinear  regressors  (which  would 
require  a  separate  evaluation  per  regressor).  Unfortunately,  they  also  suffer  from 
limited  expressiveness.  This  can  be  dealt  with  by  adding  a  feature  learning  phase, 
as  described  in  [9,  11].  Such  a  phase  automatically  decides  to  occasionally  learn 
simple  non-linear  combinations  of  the  original  input  features  that  help  differentiate 
terrains  that  are  difficult  for  a  linear  cost  function  to  discriminate.  This  approach  is 
similar  to  the  way  in  which  cost  functions  are  often  hand-engineered:  simple  lin¬ 
ear  functions  handle  the  general  cases,  with  sets  of  rules  to  handle  difficult  special 
cases.  Additionally,  such  new  features  can  be  used  as  a  guide  in  the  development  of 
further  engineered  features.  Figure  4(b)  provides  an  example  in  the  overhead  con¬ 
text,  demonstrating  a  new  feature  derived  from  only  panchromatic  satellite  imagery. 
This  new  feature  strongly  disambiguates  roads  and  trails  from  surrounding  terrain, 
and  could  be  taken  as  an  indication  that  an  explicit  road  extractor  would  be  useful. 


4  Field  Results  and  Conclusions 

The  described  imitation  learning  approach  was  implemented  to  learn  mappings  from 
both  overhead  and  onboard  perceptual  data  to  cost.  As  described  in  [10],  this  ap¬ 
proach  is  guaranteed  to  converge  when  using  a  properly  chosen  learning  rate  3 .  In 
practice,  a  decaying  learning  rate  of  the  form  7]  / yfn  is  used  at  the  nth  iteration.  The 
parameter  77  affects  only  the  rate  of  convergence;  a  well  chosen  77  usually  results 
in  convergence  after  approximately  50  -  100  iterations.  The  computation  required  at 
each  iteration  for  each  example  is  dominated  by  the  cost  of  applying  the  current  cost 
function  to  local  feature  maps,  and  then  planning  through  the  resulting  cost  map.  For 
the  training  sets  used  in  this  work,  computation  per  iteration  was  approximately  5 
minutes  on  a  2.4  Ghz  processor4. 

The  Crusher  autonomy  system  originally  made  use  of  hand-tuned  cost  functions 
for  converting  overhead  and  perception  features  to  costs.  Comparing  autonomous 
performance  when  using  different  cost  functions  can  quantify  the  differences  in  per¬ 
formance  brought  about  by  these  different  approaches.  Engineered  prior  cost  maps 
were  used  for  the  first  4  Crusher  field  experiments.  This  process  consisted  of  first 
performing  a  supervised  classification  of  the  raw  feature  maps,  and  then  converting 
the  classifier  outputs  into  costs.  This  lossy  compression  of  the  feature  space  was 
performed  to  make  designing  a  cost  function  easier  and  more  intuitive.  As  different 
test  sites  provided  differing  data  sources  and  resolutions,  this  process  was  repeated 

3  Use  of  a  smoothing  corridor  removes  the  theoretical  guarantee;  however  in  practice  this  has  not 
proven  to  affect  convergence 

4  If  faster  learning  is  required,  LEARCH  can  be  parallelized  by  example  at  each  iteration 
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for  each  test  site.  Additionally,  the  desire  to  create  cost  maps  from  different  subsets 
and  resolutions  of  prior  data  (in  order  to  perform  resolution  comparisons),  meant 
multiple  cost  functions  were  necessary.  For  each  site,  labeling  training  data  and  de¬ 
termining  parameters  for  multiple  cost  functions  would  involve  on  average  more 
than  a  day  of  a  domain  expert’s  time.  Learned  prior  cost  maps  were  then  used  for 
the  remaining  6  field  experiments.  For  each  site,  producing  a  series  of  example  paths 
would  take  on  average  only  1-2  hours  of  an  expert’s  time.  These  examples  could  then 
be  used  to  train  multiple  cost  maps  using  different  data  sources  and  resolutions. 

In  a  timed  experiment  on  a  2  km2  test  site,  producing  a  supervised  classification 
required  40  minutes  of  expert  involvement,  and  tuning  a  cost  function  required  an 
additional  20  minutes.  In  contrast,  producing  example  paths  required  only  12  min¬ 
utes.  As  Crusher  has  been  tested  on  sites  ranging  up  to  200  km2,  this  time  savings 
is  magnified  in  importance.  The  final  cost  maps  were  also  evaluated  by  comparing 
planned  routes  to  an  independent  validation  set  of  examples.  The  engineered  map 
produced  routes  that  matched  44%  of  states  along  validation  paths  on  average.  Using 
imitation  learning  to  learn  just  the  correct  weights  for  the  supervised  classification 
produced  a  map  that  scored  48%.  Imitation  learning  from  the  raw  features  scored 
57%.  This  result  demonstrates  that  the  automated  approach  performs  superior  pa¬ 
rameter  tuning,  and  makes  better  use  of  all  the  available  raw  data.  It  has  also  been 
shown  that  Crusher  navigates  more  efficiently  when  using  learned  prior  maps  online 
as  opposed  to  engineered  maps,  driving  safer  routes  at  faster  speeds  [11]. 

During  the  more  than  3  years  of  the  Crusher  program,  an  engineered  perception 
cost  function  was  continually  redesigned  and  retuned,  culminating  in  a  high  perfor¬ 
mance  system  [14].  However,  this  performance  came  at  a  high  cost.  Version  control 
logs  indicate  that  145  changes  were  made  to  just  the  form  of  the  cost  function;  ad¬ 
ditionally  more  than  300  parameter  changes  were  checked  in.  As  each  committed 
change  requires  significant  time  to  design,  implement,  and  validate,  easily  hundreds 
of  hours  were  spent  on  engineering  the  cost  function.  In  contrast,  the  final  training 
set  used  to  learn  a  cost  function  consisted  of  examples  collected  in  only  a  few  hours. 

The  performance  of  different  perception  cost  functions  was  compared  through 
over  150  km  of  comparison  trials.  The  final  results  comparing  4  different  cost  func¬ 
tions  are  presented  in  Table  1 .  In  comparison  to  the  engineered  system,  a  cost  func¬ 
tion  learned  for  the  global  planner  resulted  in  overly  aggressive  performance.  As 
discussed  in  Section  3,  learning  in  this  manner  does  not  result  in  sufficiently  high 
costs;  the  result  is  that  Crusher  drives  faster  and  turns  less  while  appearing  to  suffer 
from  increased  mobility  risk.  In  contrast,  the  costs  learned  for  the  local  planner  per¬ 
formed  very  similarly  to  the  high  performance  of  the  engineered  system.  Addition¬ 
ally,  adding  an  initial  replanning  step  further  improved  performance;  by  reducing 
cost  noise,  average  speed  increased,  with  a  decrease  in  turns  and  direction  switches, 
and  no  increase  in  mobility  risk. 

In  conclusion,  this  work  has  demonstrated  the  applicability  of  imitation  learning 
towards  improving  the  robustness  of  autonomous  navigation  systems,  while  helping 
to  minimize  the  necessary  amount  of  expert  interaction.  Specifically,  the  parame¬ 
ter  tuning  problem  that  often  results  from  the  coupling  of  complex  perception  and 
planning  systems  can  be  automated  through  expert  demonstration  instead  of  expert 


10 


David  Silver,  J.  Andrew  Bagnell,  Anthony  Stentz 


Table  1  Averages  over  295  different  waypoint  to  waypoint  trials  per  perception  system,  totaling 
over  150km  of  traverse.  Statistically  significant  differences  (from  Engineered)  denoted  by  * 


System 

Avg.  Distance 
Made  Good  (m) 

Avg.  Cmd. 
Vel.  (m/s) 

Avg.  Cmd. 
Ang.  Vel.(o/i) 

Avg.  Lat. 
Vel.  (m/s) 

Dir  Switch 

Per  m 

Avg.  Motor 
Current  (A) 

Avg. 

Roll(o) 

Avg. 

Pitch(o) 

Avg  Vert. 
Accel  (m/s^) 

Avg  Lat. 
Accel  (m/s^ ) 

Susp. 
MaxA  (m) 

Safety 

E-stops 

Engineered 

130.7 

3.24 

6.56 

0.181 

0.107 

7.53 

4.06 

2.21 

0.696 

0.997 

0.239 

0.027 

Global 

123.8* 

3.34* 

4.96* 

0.170* 

0.081* 

7.11* 

4.02 

2.22 

0.710* 

0.966* 

0.237 

0.054* 

Local 

127.3 

3.28 

5.93* 

0.172* 

0.100 

7.35 

4.06 

2.22 

0.699 

0.969* 

0.237 

0.034 

Local  w/replan 

124.3* 

3.39* 

5.08* 

0.170* 

0.082* 

7.02* 

3.90* 

2.18 

0.706* 

0.966* 

0.234* 

0.030 

intervention.  In  the  future,  we  wish  to  expand  this  approach  to  also  automate  the 
selection  of  parameters  internal  to  a  planning  system,  further  reducing  the  need  for 
human  tuning. 

This  work  was  sponsored  by  DARPA  under  contract  “Unmanned  Ground  Com¬ 
bat  Vehicle  -  PerceptOR  Integration”  (contract  MDA972-01 -9-0005)  and  by  the  U.S. 
Army  Research  Laboratory  under  contract  “Robotics  Collaborative  Technology  Al¬ 
liance”  (contract  DAAD 19-0 1-2-00 12).  The  views  and  conclusions  contained  in  this 
document  are  those  of  the  authors  and  should  not  be  interpreted  as  representing  the 
official  policies,  either  expressed  or  implied,  of  the  U.S.  Government. 
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Abstract  This  paper  introduces  a  cognitive  architecture  that  allows  service  robots 
to  cooperate  with  smart  environments  to  fulfill  user  assigned  global  tasks,  e.g., 
the  transport  of  various  kinds  of  material  in  human  populated  environments,  such 
as  hospitals.  Specifically,  knowledge  representation  issues  and  context  assessment 
strategies  are  introduced,  which  allows  event  monitoring  and  coordination  between 
service  robots  and  intelligent  applicances  to  occur.  Experimental  evaluation  is  thor¬ 
oughly  described. 


1  Introduction 

During  the  past  few  years,  the  novel  paradigm  of  Ubiquitous  Robotics  attracted 
many  researchers  worldwide  [6,  14]:  mobile  robots  are  considered  part  of  a  fully 
networked  smart  environment  that  comprises  intelligent  devices  in  charge  of  sensing 
or  acting,  e.g.,  acquiring  data  from  distributed  sensors  or  operating  automated  doors, 
elevators  or  more  complex  machines.  Since  intelligence  is  distributed,  tasks  that 
usually  require  robots  to  exhibit  sophisticated  interaction  capabilities  can  be  mapped 
to  simple  coordination  activities  between  robots  and  intelligent  devices  managed 
through  information  exchange. 

For  instance,  let’s  consider  a  service  robot  for  waste  material  transportation  in  a 
hospital.  If  it  were  fully  autonomous  it  would  be  able  to  locate,  grasp,  load,  trans¬ 
port  and  unload  waste  boxes  once  arrived  in  the  storage  site.  This  sequence  of  steps 
involves  difficult  subtasks:  among  them,  identifying  the  box  to  load  requires  sophis¬ 
ticated  data  processing  techniques,  grasping  (if  the  dimensions  of  the  box  are  appro¬ 
priate)  involves  a  tight  coordination  between  a  coupled  robotic  arm  and  the  robot, 
whereas  transportation  may  require  (beside  the  usual  specifications  in  terms  of  safe 
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navigation)  to  switch  among  different  floors  in  the  hospital,  thereby  involving  the 
interaction  with  elevators,  e.g.,  selecting  the  appropriate  floor  and  deciding  what  to 
do  if  the  elevator  is  full  of  people.  On  the  contrary,  Ubiquitous  Robotics  enforces 
the  idea  that  many  cooperating  intelligent  appliances  can  collectively  fulfill  a  global 
task:  waste  boxes  can  be  easily  located  in  the  environment  by  tracking  embedded 
RFID  tags  using  proper  antennas,  intelligent  loaders  or  shelves  can  deposit  the  box 
directly  above  the  robot  as  the  latter  notifies  to  be  in  the  proper  position,  whereas 
intelligent  elevators  can  receive  requests  from  and  communicate  back  to  the  robot 
about  their  status. 

The  research  described  in  this  paper  originates  from  a  real-world  problem:  the 
deployment  of  a  team  of  mobile  robots  at  the  Polyclinic  of  Modena  (Italy),  whose 
purpose  is  to  transport  waste  material  and  other  objects  from  different  floors  to  the 
waste  storage  site.  Such  a  system  must  be  able  to:  (i)  receive  high  level  tasks  and 
break  them  for  execution;  (ii)  exhibit  24/7  performance  capabilities;  (iii)  operate 
with  the  minimum  human  intervention  as  possible.  In  particular,  we  focus  on  a  sig¬ 
nificant  aspect  of  the  problem:  given  a  high  level  plan  to  he  executed  hy  different 
agents  (both  mobile  robots  and  other  intelligent  devices,  such  as  automated  eleva¬ 
tors)  in  a  coordinated  fashion,  how  to  monitor  its  execution  over  time?  Furthermore, 
we  concentrate  on  the  coordination  between  mobile  robots  and  automated  elevators 
for  managing  the  operation  of  floor  switching. 

The  goal  of  this  paper  is  to  describe  the  use  of  context  assessment  strategies 
to  monitor  sequences  of  events  as  they  are  performed  both  by  mobile  robots  and 
intelligent  devices.  The  proposed  architecture  adopts  ontologies  to  represent  rela¬ 
tionships  among  different  pieces  of  knowledge  and  to  provide  the  information  flow 
with  semantically  consistent  labeling.  The  paper  is  organized  as  follows.  Section  2 
discusses  relevant  literature  about  Ubiquitous  Robotics  systems  and  context  models. 
Next,  the  proposed  framework  for  context  assessment  is  described.  Finally,  Section 
4  discusses  an  example  of  robot-elevator  coordination.  Conclusion  follows. 


2  Related  Work 

Models  for  realizing  context  aware  artificial  systems  are  subject  to  three  require¬ 
ments:  (i)  effectiveness  in  assessing  sensory  information;  (ii)  reusability  in  compos¬ 
ing  contexts  to  generate  composite  representations;  (iii)  expressiveness  in  represent¬ 
ing  both  relational  and  temporal  patterns  of  events.  The  notion  of  “context”  has 
been  defined  as  “any  information  that  can  be  used  to  characterize  the  situation  of  an 
entity”  [3,  4,  9].  Whereas  in  human-machine  interaction  an  entity  is  usually  identi¬ 
fied  as  a  person,  in  Ubiquitous  Robotics  it  can  be  a  person,  a  robot,  an  automated 
elevator  or  even  a  collection  of  many  agents. 

Many  systems  suitable  to  be  labelled  as  Ubiquitous  Robotics  have  been  pre¬ 
sented  in  literature  [14].  However,  if  we  focus  on  those  systems  specifically  tak¬ 
ing  even  simple  forms  of  context  awareness  into  account,  this  number  drastically 
reduces  to  few  examples.  Within  the  Ambience  project  [19],  mobile  robots  are  cou- 
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pled  with  smart  environments  to  manage  tasks  related  to  human-robot  interaction 
and  emotional  awareness.  However,  a  principled  context  model  is  neither  acquired 
nor  represented  in  the  system,  thus  lacking  in  reusability  and  expressivenness.  The 
Ubibot  paradigm  [6]  is  aimed  at  realizing  ubiquitous  systems  which  are  composed 
of  three  main  constituents:  Mobots  (i.e.,  mobile  robots),  Embots  (i.e.,  embedded 
robots:  agents  in  charge  of  performing  data  collection)  and  Sobots  (i.e.,  software 
robots:  agents  implementing  computational  and  cognitive  algorithms).  A  properly 
defined  context  model  would  require  a  tight  integration  of  Mobots,  Embots  and 
Sobots.  However,  just  few  informations  about  user  contexts  are  maintained  by  the 
system,  which  lacks  in  effectiveness  and  reusability.  Finally,  as  part  of  the  PEIS 
Ecology  framework,  and  grounded  with  respect  to  the  specific  field  of  cooperative 
anchoring ,  the  work  described  in  [8]  is  one  of  the  first  attempts  to  populate  knowl¬ 
edge  bases  from  distributed  sources  of  information.  Unfortunately,  issues  related 
to  reusability  and  expressiveness  are  not  sufficiently  structured  for  the  purpose  of 
properly  defining  a  context  model. 

In  order  to  investigate  the  role  of  context  assessment  to  monitor  the  behavior 
of  robots  in  smart  environments,  it  seems  convenient  to  refer  to  context  models 
proposed  in  related  research  fields. 

Ontology  approaches  provide  a  superior  expressiveness  in  describing  concepts 
and  relationships  [18,  7].  The  work  described  in  [15]  adopts  activity  models  to  char¬ 
acterize  a  situation,  thereby  defining  different  states  for  an  agent.  Activity  models 
can  refer  to  many  agents:  a  situation  is  thus  originated  from  contexts  affecting  dif¬ 
ferent  individuals.  Crowley  and  colleagues  [2]  extend  this  principle  by  considering 
contexts  as  networks  of  basic  building  blocks:  different  properties  of  an  agent  are 
obtained  by  filling  particular  roles  and  relationships  among  different  contexts.  The 
network  model  abstracts  from  the  source  of  a  particular  information,  therefore  al¬ 
lowing  to  consider  distributed  sources.  Although  these  systems  offer  a  basic  frame¬ 
work  for  supporting  reasoning,  they  are  characterized  by  two  major  drawbacks:  they 
lack  in  expressiveness ,  since  they  tend  to  produce  highly  specialized  terminologies, 
and  in  reusability ,  as  context  models  can  not  be  easily  aggregated  to  build  composite 
representations. 

Logic  approaches  manage  contexts  integrating  sensory  data  with  axioms  and  in¬ 
ference  rules.  However,  two  main  inspiring  principles  deserve  special  attention  [5]: 
(i)  locality :  reasoning  must  occur  within  a  domain  defined  by  a  context,  thereby 
establishing  preferred  links  among  selected  agents;  (ii)  cross-domains  bridging :  re¬ 
lationships  can  occur  between  reasoning  activities  belonging  to  different  contexts, 
thus  allowing  occasional  relationships  between  heterogeneous  agents.  Logic-based 
architectures  have  been  recently  proposed  for  smart  environments  (see,  for  instance, 
[1]):  however,  in  spite  of  their  powerful  reasoning  capabilities,  they  show  a  number 
of  limitations  with  respect  to  expressiveness ,  since  they  lack  in  grasping  the  profu¬ 
sion  of  relationships  among  the  involved  entities  in  a  compact  way. 
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3  Context  Modeling  and  Recognition 

3.1  Modeling  Sequences  of  Distributed  Events 

Event  templates  are  represented  using  ontologies ,  which  are  a  formal  representation 
of  semantically  related  symbols  cr^,  each  one  provided  with  a  formal  description  ^ 
and  a  specific  interpretation  or  meaning  Symbols  are  associated  with  composite 
descriptions  that  allow  to  define  symbols  through  simple  descriptions.  This  is  pos¬ 
sible  since  composite  descriptions  are  always  in  conjunctive  normal  form,  i.e.,  they 
are  a  collection  of  simple  descriptions  connected  by  an  and  operator1 : 

^ k  4  ^k,i  n ...  n  n ...  n 

Simple  descriptions  can  be  used  to  link  the  abstract  symbols  to  numerical,  or¬ 
dered  or  non-ordered  values.  In  fact,  each  simple  description  is  in  the  form  r^flvj), 
where  r  is  called  role  and  v  is  an  actual  value  the  role  can  assume.  In  other  words, 
roles  are  the  formal  means  by  which  to  incorporate  actual  sensory  data  within  the 
ontology.  Roles  define  the  meaning  that  the  data  value  plays  for  the  symbol.  For 
instance,  if  we  want  to  represent  the  fact  that  elevator  e  is  at  floor  f  we  can  define  a 
symbol  aef  associated  with  the  description  Q)ef  as  follows: 

aef  =  Q)ef  <—  iselevator(e)  \latfloor(f). 

Role  values  can  be  updated  at  run  time  as  soon  as  new  sensory  data  are  processed. 
If  we  assume  the  availability  of  proper  information  sources,  descriptions  such  as  Q)ef 
can  be  easily  grounded,  i.e.,  variables  e  and  /  can  be  replaced  with  actual  values2. 
In  our  case,  we  assume  that  intelligent  devices  are  in  charge  of  controlling  all  the 
elevators  e  the  robot  must  interact  with.  Therefore,  as  soon  as  e  switches  to  floor  / 
either  because  of  human  or  robot  requests,  a  grounded  formula  aef  of  the  type  Gef 
is  updated  in  the  ontology.  For  instance,  if  e  =  £45  and  /  =  7,  we  represent  the  fact 
that  elevator  £45  is  at  floor  7  as  follows: 

Gef  =  &tef  <—  iselevator(e 45)  \latfloor(l). 

It  is  worth  noticing  that  in  ontologies  aef  is  said  to  be  an  instance  of  Gef,  meaning 
that  all  the  roles  have  been  properly  filled  and  6e f  is  grounded.  Furthermore,  we  say 
that  Gef  holds  in  the  ontology,  meaning  that  it  corresponds  to  events  occurring  in 
the  real  world.  However,  this  is  also  a  special  case  of  subsumption.  Subsumption  is 
a  set  inclusion  operator  that  compares  descriptions:  $)\  subsumes  ^2  if  is  more 
general  (or  less  restrictive)  than  As  a  consequence  Q)ef  subsumes  Q)ef  because 
the  latter  is  grounded,  whereas  the  former  is  not: 

SUbs?  [S>eff  @ef]  • 

1  Ontologies  assume  the  availability  of  common  logic  connectors,  such  as  and  n,  or  U  and  not  -1. 

2  In  formal  languages,  this  is  usually  referred  to  as  variable  assignment. 
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Fig.  1  A  graphical  sketch  of  the  context  model. 


Since  descriptions  are  in  conjunctive  normal  form,  any  simple  description  sub¬ 
sumes  the  composite  description  it  contributes  to3:  subs  ?  [iselevator(e) ,  S>ef]  • 

The  ontology  represents  primitives  that  can  be  used  to  build  templates  of  se¬ 
quences  of  events  whose  occurrence  must  be  detected  at  run  time.  We  call  each  se¬ 
quence  Situation ,  and  we  define  it  as  a  description  that  involves  the  coordination  of 
many  agents.  For  instance,  in  Section  4  we  model  the  coordination  between  mobile 
robots  and  elevators  using  a  Situation ,  since  at  least  two  agents  are  involved.  Each 
Situation  is  made  up  of  several  composite  descriptions,  each  one  related  to  an  agent 
involved  in  the  Situation :  the  goal  of  this  level  is  to  explicitly  represent  coordina¬ 
tion  between  agents.  On  the  other  hand,  we  define  each  composite  descriptions  that 
refers  to  a  single  agent  as  a  Context.  An  example  of  Context  is  oef,  since  it  depicts 
the  status  of  a  single  agent  (i.e.,  the  elevator  e).  Finally,  each  composite  description 
is  made  up  of  one  or  more  simple  descriptions,  which  express  the  single  piece  of 
knowledge  necessary  to  characterize  the  Context.  This  basic  element  is  called  Pred¬ 
icate ,  and  it  is  always  represented  in  the  form  r^(yy),  since  it  links  contexts  to  actual 
sources  of  information  (i.e.,  raw  sensory  data  as  well  as  cognitive  algorithms  that 
perform  complex  operations  on  data).  Figure  1  shows  a  graphical  representation  of 
this  model:  the  black  circles  represent  sources  of  informations  that,  through  roles, 
update  predicates  and  then  contexts  and  situations  (grey  circles  in  the  Figure). 

In  order  to  model  situations  that  evolve  over  time,  the  context  model  is  provided 
with  a  collection  of  temporal  operators  acting  upon  symbol  descriptions,  which  re¬ 
quire  to  explicitly  introduce  time  dependence.  Temporal  operators  monitor  the  dy¬ 
namics  associated  to  descriptions  holding  in  the  ontology,  and  therefore  the  cor¬ 
responding  sequence  of  events  in  the  real  world.  Temporal  operators  can  be  thus 
applied  to  Predicate ,  Context  and  Situation  symbols  indiffently: 


3  This  principle  can  be  extended  to  any  subset  of  simple  descriptions  in  conjunctive  normal  form 
with  respect  to  more  complex  composite  descriptions. 
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Fig.  2  The  context  assessment  process. 


•  Change.  The  two  operators  positive  change  and  negative  change  {true\ false}  <— 

\(?k)  and  {true\false}  <—  (o^)  are  defined,  which  monitor  the  time 

instant  T  when  the  holding  status  of  a \  changes. 

•  Longer  Than.  The  operator  longer  than  {true\false}  AT(cr^,?)  is  defined, 
which  monitors  when  the  holding  period  of  a \  exceed  a  given  threshold  t. 

•  Timeline.  The  operator  timeline  {true,  false}  -<  (oj^ ,  cr^2)  is  defined,  which  de¬ 
tects  two  consecutive  holding  symbols  <7^  and  cr^2 . 

For  instance,  if  we  want  to  represent  the  fact  that  elevator  e  is  first  at  floor  fa, 
and  then  at  floor  fa,  with  e  =  £45,  f\  =7  and  fa  =  3,  we  can  define  a  symbol  <7^73 
associated  with  the  description  ^45  73  as  follows: 

<7e45  73  =  ^e4573  iselevator(e 45)  \latfloor{l)  -<  atfloor(3). 

In  order  to  implement  these  operators  in  ontologies,  symbols  in  the  context  model 
account  for  two  specific  roles,  namely  startsat  and  endsat ,  which  specify  the  tem¬ 
poral  interval  when  the  corresponding  description  holds  in  the  ontology.  By  oppor¬ 
tunely  reasoning  upon  the  two  roles  through  common  mathematical  operators,  such 
as  <,  >  and  =,  justified  as  syntactic  sugar ,  it  is  then  possible  to  directly  encode  the 
previously  defined  temporal  operators  into  ontologies  [13]. 


3.2  Context  Assessment 

The  process  of  monitoring  the  coordinated  activity  of  many  agents  is  managed 
through  subsumption,  and  in  particular  by  iteratively  aggregating  grounded  descrip¬ 
tions  to  verify  which  symbols  in  the  ontology  hold  (see  Figure  2).  Among  the  possi¬ 
ble  symbols,  we  are  interested  in  checking  those  Situation  templates  that  constitute 
sequences  of  events  to  monitor.  At  the  time  instant  T,  as  soon  as  updated  sensory 
data  are  available,  these  are  used  to  update  Predicate  descriptions.  These  instances 
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Fig.  3  Mobile  robots  operating  at  the  Polyclinic  of  Modena. 


are  said  to  hold  in  the  ontology  in  T.  However,  previous  Predicate  instances  are  not 
lost.  On  the  contrary,  a  special  grounded  symbol  in  the  ontology  -  referred  to  as 
history  at  the  Predicate  level  -  is  maintained,  which  is  defined  by  joining  all  the 
simple  grounded  descriptions  that  held  in  previous  time  instants,  and  in  particular 
within  a  predefined  temporal  window,  with  those  holding  at  time  T  (see  Figure  2 
on  the  bottom).  Furthermore,  all  the  relevant  descriptions  of  temporal  relationships 
(i.e.,  change ,  longer  than  and  timeline)  holding  at  the  time  instant  T  between  all 
the  grounded  descriptions  in  history  are  joined  to  history  itself.  For  instance,  if  we 
recall  the  previous  example  about  symbol  <7^73,  history  is  likely  to  include  both 
atfloor(l)  and  atfloor( 3):  as  a  consequence,  the  evaluation  of  temporal  operators 
will  include  in  history  at  floor  (7)  -<  at  floor  ( 3)  as  well. 

At  this  point,  subsumption  is  used  to  detect  which  Context  symbols  hold  at  time 
instant  T.  This  is  achieved  by  checking  subs?  [  Ok ,  history  ]  for  each  symbol  Ok 
corresponding  to  a  composite  description  referring  to  a  single  agent.  Analogously 
to  what  happens  for  Predicate  symbols,  holding  Context  instances  as  well  as  all  the 
relevant  temporal  relationships  holding  between  them  in  T  are  joined  together  in  the 
history  at  the  Context  level.  Since  -  in  principle  -  there  are  many  levels  of  Context 
symbols  (depending  on  their  definitions),  this  mechanism  is  iteratively  applied. 

Finally,  subsumption  is  again  used  to  detect  which  Situation  symbols  hold  at 
time  instant  T.  This  is  achieved  by  checking  subs ?  [Ok,  history ]  for  each  symbol 
Ok  corresponding  to  a  composite  description  referring  to  many  interacting  agents. 
Holding  Situation  instances  are  said  to  correspond  to  real  world  cooperation  activi¬ 
ties  among  different  agents. 


4  Experimental  Evaluation 
4.1  Application  Scenario 

The  proposed  model  has  been  applied  to  monitor  various  coordination  tasks  be¬ 
tween  a  team  of  service  robots  and  intelligent  devices  in  a  hospital  environment, 
specifically  the  Polyclinic  of  Modena,  Italy  (see  Figure  3).  Robots  are  requested  to 
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transport  biologic  waste  and  other  materials  between  areas  located  at  different  floors 
and  the  waste  storage  site.  In  order  to  reach  the  storage  site,  robots  must  navigate  in 
a  complex  and  crowded  building,  where  other  mobile  robots,  hand-guided  vehicles, 
stretchers  and  humans  are  often  encountered.  Each  robot  is  provided  with  state-of- 
the-art  techniques  for  self-localization  [11],  navigation  [17]  and  planning  [10].  The 
operating  environment  has  been  augmented  with  a  network  of  intelligent  devices: 
robots  and  intelligent  devices  are  networked  and  able  to  communicate  with  each 
other,  and  to  provide  the  context  model  with  the  necessary  information  to  monitor 
their  behaviour. 


4.2  Monitoring  Robot-Elevator  Coordination 


Rationale.  In  order  to  monitor  the  interaction  between  mobile  robots  and  elevators,  it 
is  necessary  to  define  all  the  involved  variables  and  their  types.  We  want  to  monitor 
the  coordination  between  a  robot  r  and  an  elevator  e  in  order  to  switch  between  the 
two  floors  fi  and  fj.  The  robot  is  assumed  to  be  initially  located  outside  the  elevator 
in  an  area  at  at  floor  f ,  whereas  the  process  is  concluded  when  the  robot  is  located 
in  an  area  aj  at  floor  fj\  the  elevator  is  associated  with  an  area  ae. 

Situation  Modeling.  Adopting  the  previously  defined  variables,  and  referring  to 
a  generic  robot,  elevator,  area  and  floor  respectively  with  r,  e ,  a  and  /,  Predicate 
symbols  are  defined  as:  iselevator(e),  which  is  provided  by  the  intelligent  device 
associated  with  the  elevator  e ,  isrobot{r ),  which  originates  from  mobile  robot  r, 
atfloor(f ),  which  can  be  provided  either  by  robots  or  elevators  and  atlocation(a ), 
originated  from  mobile  robots  as  they  move  within  the  environment.  These  sym¬ 
bols  are  used  to  define  Context  symbols,  as  follows:  Gef  =  &>ef  <—  iselevator(e)  n 
at  floor  (f)  is  used  to  model  the  fact  that  elevator  e  is  at  floor  /;  orf  =  $>ef  <— 
isrobot{r)  Flat  floor  (/)  is  used  to  describe  a  context  where  robot  r  is  at  floor  /:  it 
is  worth  noticing  that  both  Gef  and  Grf  are  subsumed  by  a  more  generic  a af,  which 
expresses  that  the  agent  a  is  at  floor  /;  oea  =  Q)ea  <—  iselevator(e)  \latlocation(a ) 
describes  that  the  elevator  e  is  located  in  an  area  a  (with  respect  to  a  properly  defined 
coordinate  frame);  Gra  =  f$ra  <—  isrobot(r )  n  atlocation(a)  is  used  to  describe  the 
fact  that  robot  r  is  located  in  an  area  a  (differently  from  elevators,  the  area  changes 
as  soon  as  the  robot  navigates  in  the  environment):  again,  Gea  and  Gra  are  subsumed 
by  Gaa ,  since  the  generic  agent  is  located  in  a ;  orfa  =  @rfa^~  Grf\l  ora  is  a  compos¬ 
ite  Context  that  provides  information  about  both  floor  /  and  location  a  for  a  given 
robot  r. 

The  previously  defined  symbols  are  then  used  to  define  more  complex  Situation 
templates  as  follows: 

1 .  crea  =  &rea  Ora  l~1  Gea  is  exploited  to  model  the  fact  that  robot  r  is  inside  ele¬ 
vator  e  at  location  a. 

2.  orefai  e  =  ^refai  e  <  orj  IH  Gef  -K  <5^  ( Gra- )  8^  ( orecie )  describes  the  fact  that 

both  the  robot  r  and  the  elevator  e  are  at  floor  /,  then  the  robot  enters  the  elevator, 
as  it  moves  from  at  to  ae. 
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3-  Grefij  —  ~^refi,j  i  ( Gefi )  ^t2  ( Gefj )  ^  ^3  ( Gyfi )  ^4  iGrfj)  ^  Used  to 

model  that  both  the  elevator  £  and  the  robot  r  are  detected  to  switch  from  f  to  fj. 

4.  Orefaej  =  ^ refaej  <  (o rf  H  ®ef)  8T2  ^  ( ^reae  )  8{3  (& raj )  monitors  the 

fact  that  first  both  robot  r  and  elevator  e  are  detected  to  be  at  floor  /,  and  then  the 
robot  leaves  the  elevator. 

Using  these  Situation  symbols,  it  is  possible  to  model  the  whole  interaction  be¬ 
tween  a  mobile  robot  r  and  an  elevator  e,  switching  from  floor  f)  to  fj  as  follows: 

Grefij  —  ^refij  arfiCii  GrefiaLe  ~<  arefLj  ~<  & refjaej  • 

Situation  Recognition.  In  our  scenario,  two  data  sources  contribute  to  the  detec¬ 
tion  of  this  Situation ,  i.e.,  the  robot  s\  and  the  elevator  e 45  involved  in  the  cooper¬ 
ation  (see  Figure  3  on  the  right).  Periodically,  these  provide  instances  of  Predicate 
symbols  such  as  isrobot(s\),  is  el  ev  at  or  (e  45),  at  floor  (7)  or  at  floor  (3)  (provided 
that  we  are  interested  in  these  two  floors)  and  atlocation(a 23)  (outside  ^45  at  floor 
7),  atlocation(a45 )  (inside  the  elevator)  and  atlocation(a$  1)  (outside  £45  at  floor  3). 
At  the  beginning,  we  assume  s\  at  floor  7  and  located  in  <223.  As  a  consequence, 
Context  &Sl7a23  (subsumed  by  crfa)  holds.  Then,  ^45  is  detected  at  floor  7:  therefore 
<7g457  holds.  Next,  s\  enters  elevator  ^45,  yielding  that  it  does  not  hold  anymore  for 
s\  to  be  located  in  <223  (8^^  {cSaai3),  subsumed  by  <5£p^(<7ra/))  and  becomes  true 
that  it  is  inside  <245  (8^  (aSie 45a45),  subsumed  by  (creae))\  as  a  consequence, 
6sie457a23  45  (subsumed  by  o refai .)  holds.  When  s\  issues  the  command  for  switching 
to  floor  3,  similar  considerations  hold  for  the  grounded  Situation  oSle45 73  (subsumed 
by  Grefij)i  which  holds  when  both  s\  and  ^45  are  detected  first  at  floor  7,  and  then 
at  floor  3.  Finally,  when  54  exits  from  £45,  it  is  detected  first  in  area  <245  and  then 
in  area  <251:  when  this  happens,  the  grounded  Situation  6Sie453a45  5i  (subsumed  by 
Gre fae j )  holds,  thereby  satisfying  the  robot-elevator  coordination  situation  a refop 
since  constituting  Situation  symbols  are  detected  in  the  proper  temporal  order. 


5  Conclusion 


This  paper  describes  how  to  monitor  the  coordinated  action  of  different  agents  in 
Ubiquitous  Robotics  scenarios,  specifically  using  context-assessment  strategies  bor¬ 
rowed  from  context  modeling  and  recognition  techniques.  The  proposed  framework 
addresses  practical  requirements  for  contexts  models:  (i)  effectiveness :  the  intro¬ 
duced  architecture  is  aimed  at  directly  mapping  data  sources  to  abstract  symbols, 
thereby  assessing  further  reasoning  processes;  (ii)  reusability :  being  hierarchically 
structured,  selected  concepts  and  relationships  can  be  composed  to  build  differ¬ 
ent  monitoring  representations;  (iii)  expressiveness :  since  tractability  of  inference 
is  a  fundamental  prerequisite,  the  system  is  limited  to  simple  temporal  relationships 
among  events;  however,  practice  suggests  that  it  is  possible  to  model  a  wide  range 
of  contexts  and  situations  for  a  broader  spectrum  of  artificial  cognitive  systems. 
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Long  Term  Learning  and  Online  Robot 
Behavior  Adaptation  for  Individuals  with 
Physical  and  Cognitive  Impairments 


Adriana  Tapus,  Cristian  Tapus  and  Maja  Mataric 


Abstract  In  this  paper,  we  present  an  online  adaptation  approach  and  a  long-term 
learning  approach  for  socially  assistive  robotic  (SAR)  systems  that  aim  to  provide 
customized  help  protocols  through  motivation,  encouragements,  and  companionship 
to  users  suffering  from  physical  and/or  cognitive  changes  related  to  stroke,  aging  and 
Alzheimer’s  disease. 


1  Introduction 


A  recent  trend  in  robotics  is  to  develop  a  new  generation  of  robots  that  are  capable 
of  moving  and  acting  in  human-centered  environments,  interacting  with  people,  and 
participating  in  our  daily  lives.  This  has  introduced  the  need  for  developing  robotic 
systems  able  to  learn  how  to  use  their  bodies  to  communicate  and  to  react  to  their 
users  in  a  social  and  engaging  way.  Social  robots  that  interact  with  humans  have 
thus  become  an  important  focus  of  robotics  research. 

Research  into  Human-Robot  Interaction  (HRI)  for  socially  assistive  applications 
is  in  its  infancy.  Socially  assistive  robotics  [4]  is  an  interdisciplinary  and  increas¬ 
ingly  popular  research  area  that  brings  together  insights  from  a  broad  spectrum  of 
fields,  including  robotics,  health,  social  and  cognitive  sciences,  and  neuroscience, 
among  others. 

It  is  estimated  that  in  2050  there  will  be  three  times  more  people  over  the  age 
85  than  there  are  today  [1].  Most  of  the  ageing  population  is  expected  to  need 
physical  and/or  cognitive  assistance.  As  the  elderly  population  continues  to  grow, 
new  research  has  been  dedicated  to  developing  assistive  systems  aimed  at  promot- 
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in g  ageing-in-place,  facilitating  living  independently  in  one’s  own  home  as  long 
as  possible,  and  helping  caregivers  and  doctors  to  provide  long-term  rehabilita¬ 
tion/cognitive  stimulation  protocols.  The  first  efforts  towards  having  socially  assis¬ 
tive  robotic  systems  for  the  elderly  have  been  focused  towards  constructing  robot- 
pet  companions  aimed  at  reducing  stress  and  depression  [5],  [12],  [7],  [6],  and  [8]. 
In  addition  to  the  growing  elderly  population,  other  large  user  populations  represent 
ideal  beneficiaries  of  socially  interactive  assistive  robotics.  Those  include  individ¬ 
uals  with  physical  impairments  and  those  in  rehabilitation  therapy,  where  socially 
assistive  technology  can  serve  to  improve  not  only  mobility  [13],  [2]  [3]  but  also  for 
outcomes  in  recovery.  Finally,  individuals  with  cognitive  disabilities  and  develop¬ 
mental  and  social  disorders  (e.g.,  autism  [9])  constitute  another  growing  population 
that  could  benefit  from  assistive  robotics  in  the  context  of  special  education,  therapy, 
and  training. 

In  order  to  be  able  to  aid  the  target  user  populations,  an  effective  socially  interac¬ 
tive  assistive  robot  must  understand  and  interact  with  its  environment,  exhibit  social 
behavior,  and  focus  its  attention  and  communication  on  the  user  in  order  to  help  the 
user  achieve  specific  goals.  Social  behavior  plays  an  important  role  in  the  assistance 
of  people  with  special  needs.  An  adaptive,  reliable  and  user-friendly  hands-off  ther¬ 
apist  robot  can  provide  an  engaging  and  motivating  customized  therapy  protocol  to 
participants  in  laboratory,  clinic,  and  ultimately,  home  environments,  and  can  estab¬ 
lish  a  very  complex  and  complete  human-robot  relationship.  Therefore,  such  robots 
must  be  endowed  with  human-oriented  interaction  skills  and  capabilities  to  learn 
from  us  or  to  teach  us,  as  well  as  to  communicate  with  us  and  understand  us.  Hence, 
the  work  proposed  here  will  focus  on  robot  behavior  adaptation  to  user’s  personal¬ 
ity,  preferences  and  disability  level,  aiming  toward  a  long-term  customized  therapy 
protocol  for  stroke  rehabilitation  and  other  elderly  specific  application  domains. 

This  paper  presents  two  learning  approaches,  one  based  on  on-line  adaptation 
and  the  other  based  on  long-term  learning,  for  socially  assistive  robots  designed  for 
helping  stroke  patients  and  people  suffering  of  age-related  cognitive  impairments 
(i.e.,  dementia).  The  rest  of  the  paper  is  structured  as  follows.  Section  2  illustrates 
the  robotic  test-bed.  Section  3  describes  the  online  learning  and  behavior  adaptation 
approach  and  its  validation  in  a  rehabilitation-like  context  and  Section  4  describes 
the  long-term  learning  approach  and  its  validation  in  a  study  with  patients  with 
dementia.  Section  5  concludes  the  paper. 


2  Experimental  Platform 

The  experimental  testbed  used  was  a  custom-designed  humanoid  torso  robot  mounted 
on  a  mobile  robot  base  (Figure  1).  The  mobile  base  was  an  ActivMedia  Pioneer  2DX 
robot  equipped  with  a  speaker,  a  Sony  Pan-Tilt-Zoom  (PTZ)  color  camera,  and  a 
SICK  LMS200  eye-safe  laser  range  finder.  The  anthropomorphic  setup  involved  a 
humanoid  Bandit  II  torso,  consisting  of  22  controllable  degrees  of  freedom,  which 
included:  6  DOF  arms  (x2),  1  DOF  gripping  hands  (x2),  2  DOF  pan/tilt  neck,  2 
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DOF  pan/tilt  waist,  1  DOF  expressive  eyebrows,  and  a  3  DOF  expressive  mouth. 
All  actuators  were  servos  allowing  for  gradual  control  of  the  physical  and  facial 
expressions.  We  are  interested  in  utilizing  the  humanoid’s  anthropomorphic  but  not 
highly  realistic  appearance  as  a  means  of  establishing  user  engagement,  and  com¬ 
paring  its  impact  to  our  prior  work  with  non-biomimetic  robot  test-beds  [11]. 


Fig.  1:  Robot  test-bed:  Bandit  II  humanoid  torso  mounted  on  the  Pioneer  mobile  base 


3  Study  1 

3.1  Robot  Learning  and  Behavior  Adaptation  to  User  Personality 
and  Preferences 


The  main  goal  of  the  first  implemented  methodology  was  to  develop  a  robot  be¬ 
havior  adaptation  system  that  allows  for  dynamically  optimizing  three  main  inter¬ 
actional  parameters  (in  our  case:  interaction  distance/proxemics,  speed,  and  vocal 
content)  so  as  to  adapt  to  the  user’s  personality  toward  improving  the  user’s  task 
performance.  These  parameters  defined  the  behavior  (and  thus  personality)  of  the 
’’therapist”  robot.  Task  performance  is  measured  as  the  number  of  exercises  per¬ 
formed  in  a  given  period  of  time;  the  learning  system  changed  the  robot’s  person¬ 
ality,  expressed  through  the  robot’s  behavior,  in  an  attempt  to  maximize  the  task 
performance  metric. 

A  learning  algorithm  based  on  policy  gradient  reinforcement  learning  (PGRL) 
was  developed.  The  n-dimensional  policy  gradient  algorithm  implemented  for  this 
work  starts  from  an  initial  policy  71  =  {0i,  02,  •  •  • ,  0W}  (where  n  =  3  in  our  case). 
For  each  parameter  0;  we  also  defined  a  perturbation  step  £;  to  be  used  in  the  adap¬ 
tation  process.  The  perturbation  step  defined  the  amount  by  which  the  parameter 
may  vary  to  provide  a  gradual  migration  towards  the  local  optimum  policy.  The  use 
of  PGRL  required  the  creation  of  a  reward  function  to  evaluate  the  behavior  of  the 
robot  as  parameters  changed  to  guide  it  toward  the  optimum  policy.  The  algorithm 
consisted  of  the  following  steps:  (a)  parametrization  of  the  behavior  initial  policy 
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7r;  (b)  approximation  of  the  gradient  of  the  reward  function  in  the  parameter  space; 
(c)  movement  towards  a  local  optimum. 

The  reward  function  was  monitored  to  prevent  it  from  falling  under  a  given 
threshold,  which  would  indicate  that  the  robot’s  behavior  at  the  time  did  not  pro¬ 
vide  the  user  with  an  ideal  exercise  scenario.  This  triggered  the  activation  of  the 
PGRL  adaptive  algorithm  phase  to  adapt  the  behavior  of  the  robot  to  the  continually- 
changing  factors  that  determined  the  user’s  task  performance.  More  details  about 
this  work  can  be  found  in  [1 1]. 


3.2  Experimental  design  for  learning  in  the  physical  exercise 
context 

We  endeavored  to  develop  an  experimental  design  for  a  study  involving  stroke  pa¬ 
tients,  and  validate  it  first  with  non-patients,  in  a  lab  setting,  in  order  to  test  the  adap¬ 
tation  algorithm.  In  the  experimental  design,  the  participant  stands  or  sits  facing  the 
robot.  The  experimental  task  is  a  common  object  transfer  task  used  in  post-stroke 
rehabilitation  and  consists  of  moving  pencils  from  one  bin  on  the  left  side  of  the  par¬ 
ticipant  to  another  bin  on  his/her  right  side.  The  bin  on  the  right  is  on  an  electronic 
scale  in  order  to  measure  the  participant’s  task  performance.  The  system  monitors 
the  number  of  exercises  performed.  The  participants  are  asked  to  perform  the  task 
for  a  fixed  amount  of  time  (15  minutes  for  healthy  adults,  6  minutes  for  stroke  pa¬ 
tients),  but  they  can  stop  the  experiments  at  any  time.  At  the  end  of  each  experiment 
session,  the  experimenter  presented  a  short  debriefing.  Before  starting  the  experi¬ 
ments,  the  participants  are  asked  to  complete  two  questionnaires:  (1)  a  general  in¬ 
troductory  questionnaire  in  which  personal  details  such  as  gender,  age,  occupation, 
and  educational  background  were  determined  and  (2)  a  personality  questionnaire 
based  on  the  Eysenck  Personality  Inventory  (EPI)  for  establishing  the  user’s  per¬ 
sonality  traits. 

The  learning  algorithm  is  initialized  with  parameter  values  that  are  in  the  vicinity 
of  what  is  thought  to  be  acceptable  for  both  extroverted  and  introverted  individuals, 
based  on  the  user-robot  personality  matching  study  described  in  [10].  The  PGRL 
algorithm  evaluates  the  performance  of  each  policy  over  a  period  of  60  seconds. 
The  reward  function,  which  counts  the  number  of  exercises  performed  by  the  user 
in  the  past  15  seconds  is  computed  every  second  and  the  results  over  the  60  seconds 
“steady”  period  are  averaged  to  provide  the  final  evaluation  for  each  policy.  The 
threshold  for  the  reward  function  that  triggers  the  adaptation  phase  of  the  algorithm 
is  adjusted  to  account  for  the  fatigue  incurred  by  the  participant.  The  threshold  and 
the  time  ranges  are  all  customizable  parameters. 

In  the  post-experiment  survey,  the  participants  are  be  asked  to  provide  their  pref¬ 
erences  related  to  the  therapy  styles  or  robot’s  vocal  cues,  interaction  distances,  and 
robot’s  speed  from  the  values  used  in  the  experiments. 

We  designed  four  different  scenarios  for  extroverted  and  introverted  personality 
types;  the  therapy  styles  ranged  from  coach-like  therapy  to  encouragement-based 
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therapy  for  extroverted  personality  types  and  from  supportive  therapy  to  nurtur¬ 
ing  therapy  for  introverted  personality  types.  We  chose  to  use  pre-recorded  speech 
and  selected  words  and  phrases  for  each  of  these  scenarios  in  concordance  with  en¬ 
couragement  language  used  by  professional  rehabilitation  therapists.  The  challenge- 
based  therapy  script  is  composed  of  assertive  language  (e.g.,  “Keep  going!”  and 
“You  can  do  more  than  that!”).  Extoversion  is  also  expressed  with  higher  speech 
volume  and  faster  speech  rate.  The  aggressiveness  of  words,  volume,  and  speech 
rate  are  adjusted  to  diminish  along  with  the  robot’s  movement  towards  the  nurturing 
therapy  style  of  the  interaction  spectrum.  In  contrast  to  the  challenge-based  script, 
the  nurturing  therapy  script  contains  empathetic,  gentle,  and  comforting  language 
(e.g.,  “I’m  glad  you  are  working  so  well.”,  “I’m  here  for  you.”,  “Please  continue  just 
like  that”,  “I  hope  it’s  not  too  hard”).  The  speech  uses  lower  volume  and  pitch.  The 
transition  from  one  personality-based  therapy  style  to  another  is  done  smoothly  (see 
algorithm  above)  in  order  to  avoid  any  jarring  influence  on  the  human-robot  inter¬ 
action.  We  chose  a  set  of  three  interaction  distances  and  speeds  for  each  introverted 
and  extroverted  personality  type. 


3.3  Experimental  results  for  the  physical  exercise  context 


We  performed  the  above-described  experiment  in  a  lab-like  setting,  with  12  non¬ 
patient,  healthy  adult  participants  (7  male,  5  female).  The  participants  ranged  in  age 
between  19  and  35;  27%  were  from  a  non-technological  field,  while  73%  worked  in 
a  technology-related  area. 

As  shown  in  Figure  2,  the  robot  adapted  to  match  the  preference  of  the  participant 
in  almost  every  case.  The  only  exception  was  the  interaction  with  participant  8. 
Despite  the  fact  that  the  time  spent  in  the  preferred  training  style  of  that  participant 
was  smaller  than  the  time  spent  in  other  training  styles,  the  robot  converged  to  it  at 
the  end  of  the  exercise  period.  This  was  caused  by  the  fact  that  the  initial  state  of  the 
robot  was  in  a  training  style  that  was  furthest  from  the  participant’s  preference. 

The  pilot  results  we  obtained  support  our  hypothesis  that  the  robot  could  adapt 
its  behavior  to  both  introverted  and  extroverted  participants.  Further  details  about 
this  work  can  be  found  in  [1 1]. 


4  Study  2 

4.1  Robot  Learning  and  Behavior  Adaptation  to  User 
Ability/Performance 


The  second  learning  and  robot  behaviour  adaptation  methodology  was  designed 
for  the  interaction  between  the  robot  and  a  user  with  dementia  and/or  Alzheimer’s 
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Fig.  2:  The  percentage  of  time  that  the  12  participants  interacted  with  each  of  the  four  therapy 
styles  of  the  robot.  The  crosses  represent  the  participants’  preferences. 


disease,  with  the  main  goal  of  helping  users  improve  or  maintain  their  cognitive 
attention  through  encouragements  in  a  music-based  cognitive  stimulation  game. 

This  approach  consists  of  two  parts:  supervised  learning  and  adaptation.  The 
robot  models  the  level  of  game  challenge  that  can  be:  (a)  Difficult:  no  hints;  (b) 
Medium:  when  the  song  excerpt  starts  say  “push  the  button”  but  do  not  indicate 
which  button  to  push;  and  (c)  Easy:  when  the  song  excerpt  starts  say  which  button 
to  push.  The  supervised  learning  system  learns  the  Accepted  Variation  Band  (AVB) 
for  each  game  level  and  for  each  disability  bucket  (mild,  moderate,  and  severe), 
as  a  function  of  the  user’s  task  performance.  The  learning  phase  is  followed  by  an 
adaptation  phase,  where  the  robot  adapts  its  behavior  so  as  to  minimize  the  user’s 
reaction  time  and  maximize  the  correctness  of  the  user’s  answers.  If  the  user’s  task 
performance  is  below  the  Accepted  Variation  Band,  the  user  is  performing  better 
than  during  the  learning  phase.  The  user  is  then  promoted  to  the  next  level  of  game 
difficulty  (if  not  already  at  the  Difficult  level).  If  the  user’s  task  performance  is  above 
the  Accepted  Variation  Band,  the  user  is  not  performing  well  enough.  The  user  is 
then  helped  by  having  the  game  difficulty  level  is  decreased  (if  not  already  at  the 
Easy  level). 


4.2  Experimental  design  for  cognition  exercise  context 

The  experiment  consists  of  repeated  sessions,  during  which  the  user  and  the  robot 
interact  in  the  context  of  a  cognitive  game.  The  first  session  is  the  orientation,  in 
which  the  participant  is  ‘introduced’  to  the  robot.  The  robot  is  brought  into  the  room 
with  the  participant,  but  is  not  powered  on.  During  this  introduction  period,  the  ex¬ 
perimenter  or  the  participant’s  nurse/physical/music  therapist  explains  the  robot’s 
behavior,  the  overall  goals  and  plans  of  the  study,  and  what  to  expect  in  future 
sessions.  The  participant  is  also  asked  about  his/her  favorite  songs  from  a  variety 
popular  tunes  from  the  appropriate  time  period;  those  songs  are  later  used  in  the 
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subsequent  sessions.  At  the  end  of  the  session,  the  Standardized  Mini-Mental  State 
Examination  (SMMSE)  cognitive  test  is  administered  so  as  to  determine  the  partic¬ 
ipant’s  level  of  cognitive  impairment  and  the  stage  of  dementia.  This  test  provides 
information  about  the  cognitive  (e.g.,  memory  recall)  level  of  impairment  of  the 
participant  for  use  in  initializing  the  game  challenge  level.  The  data  determine  the 
participant’s  initial  mental  state  and  level  of  cognitive  impairment,  and  serve  as  a 
pre-test  for  subsequent  end-of-study  comparison  with  a  post-test. 

This  experiment  is  designed  to  improve  the  participant’s  level  of  attention  and 
consists  of  a  cognitive  game  called  Song  Discovery  or  Name  That  Tune.  The  par¬ 
ticipant  is  asked  to  find  the  right  button  for  the  song,  press  it,  say  the  name  of  the 
song,  and  sing  along.  The  criteria  for  participation  in  the  experiment  (in  addition  to 
the  Alzheimer’s  or  dementia  diagnosis)  include  the  ability  to  read  large  print  and 
to  press  a  button.  The  participant  sits  in  front  of  a  vertical  experimental  board  with 
5  large  buttons  (e.g.,  the  Staples  EASY  buttons).  Four  buttons  correspond  to  the 
different  song  excerpts  (chosen  as  a  function  of  the  user’s  preference)  and  the  last 
button  corresponds  to  the  SILENCE  or  no  song  excerpt  condition.  Under  each  but¬ 
ton,  a  label  with  the  name  of  the  song  (or  SILENCE)  is  printed.  The  robot  describes 
to  each  participant  the  goal  of  the  game  before  each  session,  based  on  the  following 
transcript:  “We  will  play  a  new  music  game.  In  it,  we  will  play  a  music  collection 
of  4  songs.  The  songs  are  separated  by  silence.  You  will  have  to  listen  to  the  music 
and  push  the  button  corresponding  to  the  name  of  the  song  being  played.  Press  the 
button  marked  “SILENCE”  during  the  silence  period  between  the  songs.  The  robot 
will  encourage  you  to  find  the  correct  song.”  Each  participant  is  first  asked  by  the 
music  therapist  or  the  robot  to  read  aloud  the  titles  of  the  songs  and  to  press  a  button. 
Some  additional  directions  are  given.  The  participant  is  also  directed  to  press  the  SI¬ 
LENCE  button  when  there  is  no  music  playing.  After  a  review  of  the  directions,  the 
participant  is  asked  by  the  robot  to  begin  the  music  game.  The  music  compilation  is 
composed  of  a  random  set  of  song  excerpts  out  of  the  four  different  songs  that  form 
the  selection  and  the  silence  condition.  The  entire  music  compilation  lasts  between 
10  and  20  minutes,  and  is  based  on  the  user’s  level  of  cognitive  impairment:  the 
larger  the  impairment,  the  shorter  the  session.  A  song  excerpt  can  be  vocal,  instru¬ 
mental,  or  both.  The  order  of  song  excerpts  is  random.  The  experiment  was  repeated 
once  per  week  for  a  period  of  8  months  in  order  to  capture  longer-term  effects  of  the 
robot  therapist.  A  within- subject  comparison  was  performed  to  track  any  improve¬ 
ment  over  multiple  sessions.  No  between- subject  analysis  was  done  due  to  the  small 
sample  size  and  large  differences  in  cognitive  ability  levels. 


4.3  Experimental  results  for  the  cognitive  exercise  context 

The  initial  pilot  experimental  group  consisted  of  9  participants  (4  male,  5  female), 
from  our  partner  Silverado  Senior  Living  care  facility.  All  the  participants  were 
seniors  over  70  years  old  suffering  of  cognitive  impairment  and/or  Alzheimer’s  dis¬ 
ease.  The  cognitive  scores  assessed  by  the  SMMSE  test  were  as  follows:  1  mild,  1 
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moderate,  and  7  severe.  Due  to  the  total  unresponsiveness  of  6  of  the  severely  af¬ 
fected  participants,  only  1  severely  cognitively  disabled  participant  was  retained  for 
the  rest  of  the  study,  resulting  in  a  final  participant  group  composed  of  3  participants 
(all  female). 


Fig.  3:  Human-robot  interacting  during  the  music  game:  the  robot  gives  hints  related  to  the  music 
game,  the  user  answers,  and  the  robot  congratulates  and  applauds  the  correct  answer 


We  constructed  the  training  data  and  built  a  model  for  each  cognitive  disability 
level  and  for  each  game  level.  The  participants  played  each  game  level  10  times 
(stages)  in  order  to  construct  a  robust  training  corpus. 

The  results  obtained  over  6  months  of  robot  interaction  (excluding  the  2  months 
of  learning)  suggest  that  the  elderly  people  suffering  of  dementia  and/or  Alzheimer’s 
can  sustain  attention  to  music  across  a  long  period  of  time  (i.e.,  on  average  20  min¬ 
utes  for  mildly  impaired  participants,  14  minutes  for  moderately  impaired  partici¬ 
pants,  and  10  minutes  for  severely  impaired  participants)  of  listening  activity  de¬ 
signed  for  the  dementia  and/or  Alzheimer’s  population.  Figures  4a,  4b,  and  4c  illus¬ 
trate  the  evolution  of  the  game  difficulty  over  time,  as  well  as  response  incorrectness 
and  reaction  time  for  user  Jd  1 . 

Outcomes  are  quantified  by  evaluating  task  performance  and  time  on  task.  Based 
on  the  results  we  obtained,  it  can  be  concluded  that  the  SAR  system  was  able  to 
adapt  the  challenge  level  of  the  game  it  was  presenting  to  the  user  in  order  to  en¬ 
courage  task  improvement  and  attention  training.  Figure  4a  shows  the  evolution  in 
time  of  the  game  level  for  user_id  1 .  The  participant  started  at  the  easy  game  level 
and  remained  there  for  several  sessions.  The  participant  then  started  to  perform  bet¬ 
ter  and  diminished  the  reaction  time  and  reduced  the  number  of  incorrect  answers, 
which,  in  turn,  resulted  in  a  game  level  evolution  from  the  easy  level  to  difficult. 
Starting  from  the  22nd  trial,  the  participant  consistently  remained  at  the  highest 
level  of  difficulty  in  the  game  (see  Figure  3).  Figures  4b  and  4c  depict  the  evolution 
of  the  reaction  time  and  the  number  of  incorrect  answers.  The  decrease  of  those 
metrics  indicates  improvement  on  the  task.  Similar  improvement  was  observed  for 
all  participants. 

The  participants  recognized  the  songs  and  identified  the  silence  periods  with  the 
same  probability.  Hence,  the  analysis  of  the  “no  answer”  situation  among  our  el¬ 
derly  participants  provides  us  with  additional  information.  From  our  experiments, 
we  noticed  that  the  average  rate  of  absence  of  response  to  silence  was  higher  than 
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Game  Level  Adaptation  Over  Time  -  User  #  1  -  SMMSE  Disability  Level  #  1 
31- 


Incorrectness  Evolution  Over  Time  -  User#1  -  SMMSE  Disablity  Level  #1 


(a) 


(b) 


Reaction  Time  Evolution  Over  Time  -  User  #  1  -  SMMSE  Disability  Level  #  1 


(c) 


Fig.  4:  Results:  (a)  Game  Level  Adaptation  and  Evolution  Over  Time  (6  months)  for  User  Id  1;  (b) 
Incorrectness  Evolution  Over  Time  (6  months)  for  User  Id  1;  (c)  Reaction  Time  Evolution  Over 
Time  (6  months)  for  User  Id  1 


the  average  rate  of  absence  of  response  to  songs,  and  that  this  phenomenon  in¬ 
creased  with  the  severity  of  the  cognitive  impairment.  Our  conjecture  is  that  music 
stimulates  the  interest  and  responsiveness  of  the  participants.  Another  interesting 
observation  that  deserves  more  study  is  the  users’  ability  to  participate  simultane¬ 
ously  in  different  tasks  (multitasking):  the  participants  were  able  to  sing  and  push 
the  correct  buttons  at  the  same  time.  This  is  notable  in  particular  for  participants 
with  cognitive  disability,  since  multitasking  requires  dividing  attention. 

In  summary,  our  social  robot  was  able  to  improve  or  maintain  the  cognitive  at¬ 
tention  of  users  with  Alzheimer’s  Disease  in  a  specific  music-based  cognitive  game. 
The  robot’s  capability  of  adapting  its  behavior  to  the  individual  user’s  level  of  dis¬ 
ability  helped  to  improve  the  user’s  task  performance  in  the  cognitive  game  over 
time. 
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5  Conclusions 


This  research  has  aimed  to  develop  adaptation  and  learning  methods  for  socially 
assistive  therapist  robots  that  can  provide  customized  physical  rehabilitation  and/or 
cognitive  stimulation.  We  have  presented  results  from  two  different  adaptation  and 
learning  approaches,  validated  with  healthy  adults  as  well  as  with  elderly  users  with 
Alzheimer’s  Disease.  Our  results  are  encouraging  in  light  of  our  pursuit  toward  cre¬ 
ating  personalized  socially  assistive  technologies  that  aim  to  improve  human  quality 
of  life. 
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